Graphic depicting a woman and a robot. Björn Lindenberg has created the image himself using the latest advancements in generative AI (stable-diffusion[deliberate_v2] + control-net) and a large number of improvement iterations.

New dissertation demonstrates how AI can learn to make effective decisions through reinforcement learning and reward systems

In a new dissertation in mathematics, Björn Lindenberg shows how reinforcement learning in AI can be used to create effective strategies for autonomous decision-making in various environments. Reward systems can be developed to reinforce correct behaviour, such as finding optimal pricing strategies for financial instruments or controlling robots and network traffic.

Reinforcement learning is a part of AI where a digital decision-maker, known as an agent, learns to make decisions by interacting with its environment and receiving rewards or punishments depending on how well it performs its actions. The agent receives rewards and punishments in the learning process by acting in an environment and receiving feedback based on its actions. By maximising rewards and minimising punishments, AI gradually learns to perform desirable actions and improve its performance in the given task.

Reinforcement Learning Trains AI in Autonomous Decision-Making

The goal is to develop algorithms and models that help the agent make the best decisions. This is achieved through learning algorithms that take into account the agent's previous experiences and improve its performance over time. There are many applications for reinforcement learning, such as game theory, robotics, financial analysis, and control of industrial processes.

From professional poker player to PhD in mathematics

After a couple of years studying physics at KTH, Björn made a living as a professional poker player for 9 years. During that time, he read a lot about game theory, particularly about self-playing agents.

After coming across a particularly interesting article that theoretically provided the solution to winning in poker, he realised that his self-studies had become more enjoyable than the actual job of being a poker player and decided to start studying again.

Björn first studied engineering physics in Lund, where he came to the conclusion that he was primarily interested in theoretical subjects such as quantum mechanics and, by extension, mathematics.

After further searching, he found his home in mathematics at Linnaeus University in 2014. There, he obtained a bachelor's degree and then began a master's degree in 2016, which directly led to a doctoral position in mathematics in 2017 under the supervision of Karl-Olof Lindahl.

"My research focuses on reinforcement learning where an agent is placed in an environment. The agent observes the state of the environment at each step, similar to how we humans perceive our surroundings. This could, for example, be the chessboard position, incoming video footage, industrial data, or sensor data from a robot. The agent makes decisions by choosing an action from a list of options, such as moving a chess piece or controlling a robot movement. These choices can then affect the environment and create a new game situation in chess or provide new sensor values for a robot", says Björn Lindenberg.

New Mathematical Model Enhances Reliability in the Learning Process

In his dissertation, Lindenberg has developed a model for deep reinforcement learning with multiple concurrent agents, which can enhance the learning process and make it more robust and effective. He has also investigated the number of iterations, i.e., repeated attempts, required for a system to become stable and perform well.

"Deep reinforcement learning is advancing at the same pace as other AI technologies, that is, very rapidly. This is largely due to exponentially increasing hardware capacity, meaning that computers are becoming more and more powerful, along with new insights into network architectures", Lindenberg continues.

The more complex the applications become, the more advanced mathematics and deep learning is needed in reinforcement learning. This need is evident in promoting the understanding of existing problems and discovering new algorithms.

"The methods presented in the dissertation can be incorporated into a variety of decision-making AI applications that, whether we realise it or not, are becoming an increasingly prevalent part of our daily lives," Lindenberg concludes.

More information

Link to the dissertation: Reinforcement Learning and Dynamical Systems


Björn Lindenberg, PhD in mathematics at the Department of Mathematics, email:, mobile: +4673-819 56 19

Facts about AI and Reinforcement Learning

Reinforcement learning in AI is based on mathematics and calculations, where rewards are used to train the AI agent to make decisions. This entails developing mathematical models and algorithms to train an AI to make optimal choices by rewarding it when it makes correct decisions and penalising it when it makes mistakes. It can be simplified as follows:

The AI has a "brain" that helps it make decisions. Think of the brain as a function that takes in information about a situation and generates a choice. To train the AI, one starts with random values for the brain's choices. The AI makes choices and receives rewards or penalties based on how good its choices were. To learn, the AI gradually adjusts its choices based on the feedback.

If the AI receives a high reward for a choice, the likelihood of it making the same choice again increases. If the AI receives a penalty, the likelihood of it making the same choice again decreases. This way, the AI gradually learns which choices lead to higher rewards and avoids choices that result in penalties.

By repeating the process of making choices, receiving feedback, and adjusting its choices, the AI becomes better and better at making the most favourable choices to obtain the highest possible rewards.

Björn provides two thought examples:

Imagine a cleaning robot that is supposed to pick up recycling cans. Each time the robot makes a decision and moves, it can receive either positive or negative feedback. If it makes a decision that saves energy or finds recycling cans, it receives a positive reward. But if it uses a lot of energy without finding anything valuable, it receives a penalty. The goal is for the robot to learn to make decisions that lead to rewards in the long run, even if it means it has to use energy initially. It's similar to how humans have to make an effort first to gain something better later.