Stochastic Optimal Control - ICML 2008 tutorial
to be held on Saturday July 5 2008 in Helsinki, Finland,as part of the 25th International Conference on Machine Learning (ICML 2008)
Bert Kappen,
Radboud University, Nijmegen, the Netherlands
Marc Toussaint,
Technical University, Berlin, Germany
Stochastic optimal control theory concerns the problem of how to act optimally when reward is only obtained at a later time. The stochastic optimal control problem is central to modeling intelligent behaviour in animals or machines. Examples are control of multi-joint robot arms, navigation of vehicles, coordination of multi-agent systems. In addition, control theory plays an important role in financial applications.
Currently, the dominant approach to the above problems within the Machine learning community is Reinforcement Learning or (Partially Observable) Markov Decision Processes and often uses discounted reward. One can view these approaches as special cases of stochastic control theory.
The tutorial is introductory and aimed at the 'average' machine learning researcher. No background in control theory and/or reinforcement learning is assumed. A basic understanding of Bayesian networks and statistical inference is assumed.
Outline
- Deterministic optimal control (Kappen, 30 min.)
- Introduction of optimal control problems, types of control problems
- Dynamic programming solution and deterministic Bellman equation
- Discrete and continuous time formulation
- Pontryagin minimum principle
- Examples
- Stochastic optimal control, discrete case (Toussaint, 40 min.)
- Why stochasticity?
- Markov Decision Processes
- Stochastic Bellman optimality equation
- Dynamic Programming, Value Iteration
- Learning from experience: Temporal Difference, Q-learning, eligibilities, Exploration-Exploitation, Bayesian RL
- coffee break
- Stochastic optimal control, continuous case (Kappen, 40 min.)
- Stochastic differential equations
- Stochastic optimal control, Hamilton-Jacobi-Bellman equation
- Linear Quadratic control, Ricatti equation
- learning, inference and control, certainty equivalence
- Path integral control
- Coordination of agents, mapping to graphical model inference
- Research issues (Toussaint, 30 min.)
- Challenges in stochastic optimal control
- Probabilistic inference approach to optimal control
- Examples: POMDPs, robotic motion control and planning
- Model learning in robotics
Tutorial manuscript
The tutorial slides can be accessed here:- Parts 1 & 3: Deterministic optimal control, stochastic optimal control continuous case
- Part 2: Markov Decision Processes
- Part 4: Research issues, robotics applications
- Kappen: Stochastic optimal control theory
- Toussaint: lecture notes on MDPs, notes on LQG
- Jönsson: Lectures on Optimal Control
Tutorial demo code
Kappen: Matlab code for n joint problemHere is a directory of matlab files, which allows you to run and inspect the variational approximation for the n joint stochastic control problem as discussed in the tutorial text section 6.7. Type tar xvf njoints.tar to unpack the directory and simply run file1.m. In file1.m you can select demo1 (3 joint arm) or demo2 (10 joint arm). You can also try larger n but be sure to adjust eta for the smoothing of the variational fixed point equations. You can compare the results with exact cmputation (only recommendable for 2 joints) by setting METHOD='exact'. There is also an implementation of importance sampling (does not work very well) and Metropolis Hastings sampling (works nice, but not as stable as the variational approximation).
Useful web material
- Bert Kappen (2006): An introduction to stochastic control theory, path integrals and reinforcement learning. Proceedings 9th Granada seminar on Computational Physics: Computational and Mathematical Modeling of Cooperative Behavior in Neural Systems Americal Institute of Physics.
- Richard Weber (2006): Lecture notes on Optimization and Control. Lecture notes of a course given autumn 2006.
- Marc Toussaint: Video lecture on probabilistic inference methods in robotics. Held at the Pascal Symposium, Bled 2008.
- Marc Toussaint, Stefan Harmeling, Amos Storkey (2006): Probabilistic inference for solving (PO)MDPs. Research Report EDI-INF-RR-0934, University of Edinburgh, School of Informatics.
- Emanuel Todorov (2006): Mathematical introduction to optimal control theory. A review.
Organizers & presenters:
Bert Kappen, b.kappen@science.ru.nlMarc Toussaint, mtoussai@cs.tu-berlin.de