Reinforcement learning is a hot area in machine learning and artificial intelligence. It is the basis for development in robotics, image recognition, self-driving cars, and choice of stock portfolios, to name a few examples. The project's main goal is to develop robust algorithms for reinforcement learning that can be used in optimization and technology development.
Project manager Karl-Olof Lindahl Other project members Björn Lindenberg, Jonas Nordqvist Participating organizations Linnaeus University Financier Linnaeus University Timetable 2019– Subject Mathematics (Department of Mathematics, Faculty of Technology)
More about the project
Reinforcement learning (RL) is a hot area in machine learning and artificial intelligence that is the basis for development in robotics, image recognition, self-driving cars, and choice of stock portfolios, to name a few examples. The field is about maximizing future accumulated rewards over time for an intelligent system operating in an environment.
A key characteristic of RL is that the system's interaction with the environment can be described in terms of decision processes. The focus is on finding a balance between exploring new territory and greedily utilizing current knowledge about rewards when making decisions. Our main goal is to develop robust algorithms for RL that can be used in optimization and technology development.
In particular, the project regards distributional reinforcement learning (DRL). The total reward over time is characterized in these cases by a stochastic variable whose distribution function is then to be estimated.
In DRL, convergence during iteration can give rise to superhuman strategies based on the idea of tabula rasa. That is, the machine itself can find optimal solutions in different environments without human intervention. Among other things, we work with the development of algorithms that we implement and evaluate in a standardized test suite consisting of Atari 2600 games. This gives rise to complex and multidimensional environments where algorithms use states based on screens. The performance is then compared to state-of-the-art among established algorithms in the field of RL.