fraktaler
Seminarium i matematik

Bernt Øksendal och Björn Lindenberg

Välkommen till föreläsningar i seminarieserien i matematik.

Föreläsning 1

Föreläsare/Lecturer

Bernt Øksendal

Titel/Title

Stochastic Fokker-Planck PIDE for conditional McKean-Vlasov jump diffusions and applications to optimal control

Tid/Time

Kl13.15 - 14.00

Sammanfattning/Abstract

The purpose of this paper is to study optimal control of conditional McKean-Vlasov (mean-field) stochastic differential equations with jumps (conditional McKean-Vlasov jump diffusions, for short). To this end, we first prove a stochastic Fokker-Planck equation for the conditional law of the solution of such equations.

Combining this equation with the original state equation, we obtain a Markovian system for the state and its conditional law. Furthermore, we apply this to formulate an Hamilton-Jacobi-Bellman (HJB) equation for the optimal control of conditional McKean-Vlasov jump diffusions.

Then we study the situation when the law is absolutely continuous with respect to Lebesgue measure. In that case the Fokker-Planck equation reduces to a stochastic partial differential equation (SPDE) for the Radon-Nikodym derivative of the conditional law. Finally we apply these results to solve explicitly the following problems:
(i) Linear-quadratic optimal control of conditional stochastic McKean-Vlasov jump diffusions.
(ii) Optimal consumption from a cash flow modelled as a conditional stochastic McKean-Vlasov jump diffusions.

 

Föreläsning 2

Föreläsare/Lecturer

Björn Lindenberg

Titel/Title

Conjugated Discrete Distributions for Distributional Reinforcement Learning

Tid/Time

Kl14.15 - 15.00

Sammanfattning/Abstract

In this work we continue to build upon recent advances in reinforcement learning for finite Markov processes. A common approach among previous existing algorithms, both single-actor and distributed, is to either clip rewards or to apply a transformation method on Q-functions to handle a large variety of magnitudes in real discounted returns. We theoretically show that one of the most successful methods may not yield an optimal policy if we have a non-deterministic process. As a solution, we argue that distributional reinforcement learning lends itself to remedy this situation completely. By the introduction of a conjugated distributional operator we may handle a large class of transformations for real returns with guaranteed theoretical convergence. We propose an approximating single-actor algorithm based on this operator that trains agents directly on unaltered rewards using a proper distributional metric given by the Cramér distance. To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain state-of-the-art performance compared to other well-known algorithms in the Dopamine framework.
 

Lokal/Local

Rum för båda föreläsningar är B1006, Hus B i Växjö, samt tillgängliga via Zoom:  https://lnu-se.zoom.us/j/64045402242?pwd=WFdwZUYxdEJQR3QxK0RVR1Nka0oxZz09