【Seminar】 Dr. Thomas Akam “Reconciling parallel 'reinforcement learning' systems in cortex and basal ganglia ”
Speaker: Dr. Thomas Akam, University of Oxford
Title: "Reconciling parallel 'reinforcement learning' systems in cortex and basal ganglia"
The reward prediction error (RPE) hypothesis of dopamine function is one of the great success stories of theoretical neuroscience, explaining a diverse set of experimental data from normative principles. Reinforcement learning (RL) models of cortico-striatal function typically assume that cortex represents the current state of the world, and dopaminergic RPE’s update value estimates at cortico-striatal synapses to modify future behaviour. However, some experimental data are hard to reconcile with this account. First, there is abundant evidence for what appear to be value signals in frontal cortex, raising the question of what they are doing there if value learning happens in basal ganglia? Second, in some reward guided decision tasks, behavioural flexibility appears to be mediated by hidden state inference rather than RPE driven value learning. I will present a computational model of learning in cortico-basal ganglia circuits, and motivating experimental data from the mouse dopamine system, which attempts to reconcile these observations.
A recurrent network representing frontal cortex is trained to predict the next observation, i.e. to minimize sensory prediction errors, and in doing so learns to infer latent states of the environment. Striatum, represented by a feedforward network, receives the observable task states and PFC activity as input, and implements actor critic reinforcement learning to predict values and select actions. Trained to solve a multi-step decision task, the model explains a set of otherwise paradoxical observations from dopamine recording and manipulation experiments. Insomuch as the model is correct, it suggests that: i) Signals interpreted as action values in cortex may in fact represent beliefs about latent states of the environment. ii) The influence of rewards on future choices in reward-guided decision tasks may be mediated by recurrent activity dynamics in cortex not dopamine-driven synaptic weight changes in striatum.
Join Zoom Meeting
Meeting ID: 948 6696 2338
Sensory and Behavioural Neuroscience Unit- Fukunaga Unit
Subscribe to the OIST Calendar: Right-click to download, then open in your calendar application.