Learning a world model and planning with a self-organizing dynamic neural system (NIPS 2003)
Project Page for:
M. Toussaint (2003). Learning a world model and planning with a self-organizing dynamic neural system. In Advances of Neural Information Processing Systems 17 (NIPS 2003), 929-936, MIT Press, Cambridge. nlin.AO/0306015.
Abstract: We present a connectionist architecture that can learn a model of the relations between perceptions and actions and use this model for be- havior planning. State representations are learned with a growing self- organizing layer which is directly coupled to a perception and a motor layer. Knowledge about possible state transitions is encoded in the lat- eral connectivity. Motor signals modulate this lateral connectivity and a dynamic field on the layer organizes a planning process. All mecha- nisms are local and adaptation is based on Hebbian ideas. The model is continuous in the action, perception, and time domain.
Supplementary Videos:
The following 3 movies visualize the experiments with the CWM on a maze problem. The speed of the movies directly corresponds to the speed of the experiment done online on a 2GHz Pentium (the code though is not optimized for speed).
- Growth: The movie displays the growth of the CWM during self-organization. On the left, you find the agent exploring the maze via a random walk. On the right, the central layer of the CWM is displayed; the color of the connections corresponds to their weights (red=1 and blue=0).
- Planning: The movie visualizes the planning process with the CWM. On the left, you find the maze; the current goal is marked by a red spot. The agent (the white spot) moves straight to the goal. On the right, you find the value field on the central layer visualized (red=1 and blue=0). Whenever the agent reaches the goal, the goal is changed to a new random position. The value field rearranges quickly and relaxes to its fixed point (which corresponds to the Bellman equation). Given this stationary value field, the agent chooses actions that lead ``uphill’’ towards the goal.
- Learning: The movie displays how the CWM is capable to learn changes of the world. On the right, the color of the connections visualizes their weights (red=1 and blue=0).
As before the goal randomly set within the maze and changes whenever the agent reaches the goal. However, at some time, a trespass in the upper left part of the maze is shut. It occurs that the agent tries to move through this trespass; it re-adapts its world model (note how some connections become blue!); and, if the readaptation of these weights induces a sufficient change of the relaxed value field, the agent moves around the blockade to the goal. Thereafter, the agent has to learn that the trespass is blocked also when approaching from the left.
In the lower right part of the maze, another trespass was blocked and the agent analogously learns this blockade (see that all connections in this region turn blue). Once the two blockades have been learned, the agent never explores them again.