The aim of Okinawa Computational Neuroscience Course is to provide opportunities for young researchers with theoretical backgrounds to learn up-to-date neurobiological findings, and those with experiment backgrounds to have hands-on experience in computational modeling.
The special topic for this year's course is "Predictions and Decisions." We invite graduate students and postgraduate researchers to participate in the course, held from July 1st through 10th at a beach resort near the future campus site of Okinawa Institute of Science and Technology.
This course is one of the tutorial courses sponsored by the Cabinet Office of the Japanese government as a precursory activity for Okinawa Institute of Science and Technology. The course concluded successfully with excellent lectures and lively discussions by 60 participants from 21 counties.
|Thursday, June 30th||Check-in|
|Friday, July 1st|
|16:30-17:30||Introduction to Student Project|
|Saturday, July 2nd|
|16:30-17:30||Poster Session: part 1|
|Sunday, July 3rd|
|Andrew G. Barto|
|16:30-17:30||Poster Session: part 2|
|Monday, July 4th|
|16:30-17:30||Poster Session: part 3|
|Tuesday, July 5th|
|Excursion to OIST Initial Research Project Lab. and OIST Campus site|
|Wednesday, July 6th|
|Thursday, July 7th|
|Friday, July 8th|
|Saturday, July 9th|
|Anitha Pasupathy (Canceled)|
|Sunday, July 10th|
|Presentations of student projects|
|Presentations of student projects|
|Monday, July 11th||Check-out|
Schedule of Lectures
|1 (fri)||2 (Sat)||3 (sun)||4 (mon)||5 (tue)|
|6 (wed)||7 (thu)||8 (fri)||9 (sat)||10 (sun)|
Student Project Presentation
Lecture Title, Abstract, and Suggested Readings:
Kenji Doya - Prediction, Control, and Decisions"
As an introduction to the entire course, I will first give an overview of theoretical concepts related to prediction and actions, both in continuous and discrete domains. I will go through three major types of learning, namely supervised learning, reinforcement learning, and unsupervised learning, and explain how they may be related to the functions of the cerebellum, the basal ganglia, and the cerebral cortex, respectively. I will also present our hypothesis about how neuromodulators, such as serotonin and noradrenaline, could be involved in regulating the parameters for prediction and decisions.
I will then introduce research topics related to predictions and decisions from our own lab. I will talk about neural recording studies to test our hypothesis that neurons in the striatum, the input site of the basal ganglia, encode 'action value,' an estimate of how much reward an action will yield. I will also present the results of functional brain imaging studies that tried to understand the mechanisms of predicting immediate and future rewards. The results suggest that different parts of parallel cortico-basal ganglia loops are specialized for prediction of rewards in different time scales, and that they are differentially modulated by the serotonergic projection from the raphe nucleus.
Doya K. (1999). What are the computations of the cerebellum, the basal ganglia, and the cerebral cortex. Neural Networks, 12, 961-974.
Doya K. (2002). Metalearning and neuromodulation. Neural Networks, 15, 495-506.
Tanaka C. S., Doya K., Okada G., Ueda K., Okamoto Y., Yamawaki S. (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7, 887-893.
Download the lecture slides (ppt) here
Reza Shadmehr - "Generalization and Consolidation of Memory:
A Computational Perspective"
Neuroscientists are like cryptographers that are trying to decipher an unknown code; how does the brain represent information about the world around it? A key tool is to provide information to the brain and then measure how that information is generalized to novel situations. Here I will use two examples from learning theory, a problem in
regression and a problem in classification, to show how epresentation of information dictates patterns of generalization. We will link the problem of adaptation to the problem of identification and then show that from trial-by-trial patterns of error that the learner makes, one can estimate a generalization function and relate that to internal representation chosen by the learner. We will use our theoretical insights to analyze two kinds of data sets: one where people learned to control a novel dynamical system, and another where people learned to classify a set of visual cues.
A second key idea is that memory appears to undergo changes despite the fact that the learner has stopped practicing the task. How can internal representations change when there is no overt error in behavior? To motivate our discussion, we will concentrate on a simple saccade task where gain of the saccade is manipulated, resulting in an important phenomenon called savings. Using the framework of regression in a Bayesian setting, we will try to analyze this task. We will represent memory as a prior distribution and derive a formulation of adaptation where changes in representation are due to an optimal comparison of evidence with respect to our prior. The result will be a fast adaptive system that reacts both to the evidence an the prior, and a slow adaptive system that gradually updates the prior as more evidence is collected from the environment. The slow and fast systems will exhibit some of the properties of memory consolidation.
Poggio T, Fahle M, Edelman S. (1992) Fast perceptual learning in visual hyperacuity. Science 1992 May 15, 256:1018-21.
Y. Kojima, Y. Iwamoto, K. Yoshida (2004) Memory of learning facilitates saccadic adaptation in the monkey. Journal of Neuroscience 24:7531.
Thoroughman KA, Shadmehr R (2000) Learning of action through adaptive combination of motor primitives. Nature, 407:742-747
Stefan Schaal - "Scaling Reinforcement Learning to Complex Motor Systems"
While among the most general approaches to learning control, classical approaches to reinforcement learning remain computationally infeasible for complex motor systems. This presentation describes several new approaches to overcome this scaling problem. Parameterized modular control policies derived from dynamic systems theory are one of the crucial component to reduce the dimensionality of the state-action space to explore during learning, and to ensure a high level of generalization of a learned task to new situations. Imitation learning as a seed for trial-and-error learning with reinforcement learning is another important component. A third element is rooted in new developments of stochastic policy gradient learning in reinforcement learning, particular as applied to learning from trajectories (or roll-outs). Parallels to biological motor control and examples from learning with humanoid robots will illustrate the suggested methodology.
Ijspeert A, Nakanishi J, Schaal S (2003) Learning attractor landscapes for learning motor primitives. In: Becker S, Thrun S, Obermayer K (eds) Advances in Neural Information Processing Systems 15. Cambridge, MA: MIT Press, pp 1547-1554
Peters J, Vijayakumar S, Schaal S (2003) Reinforcement learning for humanoid robotics. In: Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots, Karlsruhe, Germany, Sept.29-30
Schaal S, Sternad D, Osu R, Kawato M (2004) Rhythmic movement is not discrete. Nature Neuroscience 7: 1137-1144
Download the lecture slides (pdf) here
Mitsuo Kawato - "Predictions by Cerebellar Internal Models"
Cerebellar internal model hypothesis postulates that microzone in the cerebellar cortex can acquire internal forward or inverse dynamics model of some dynamical processes outside the cerebellum in supervised learning, while climbing fiber inputs from the inferior olive nucleus provide necessary error-signal information. Computational, anatomical, neurophysiological and neuroimaging examinations of this hypothesis will be reviewed. Furthermore, chaotic dynamics of the inferior olive neurons beneficial for error coding, and bioinformatics model for the spike timing dependent plasticity in the cerebellar long term depression will be introduced as recent progresses.
 Kawato M: Internal models for motor control and trajectory lanning. Current Opinion in Neurobiology, 9, 718-727 (1999).
 Imamizu H, Miyauchi S, Tamada T, Sasaki Y, Takino R, Puetz B, Yoshioka T, Kawato M: Human cerebellar activity reflecting an acquired internal model of a new tool. Nature, 403, 192-195 (2000).
 Schweighofer N, Doya K, H. Fukai, Chiron JV, Furukawa T, Kawato. M: Chaos may enhance information transmission in the inferior olive. Proc Natl Acad Sci USA., 101, 4655-4660 (2004).
 Doi T, Kuroda S, Michikawa T, Kawato M: Insoitol, 1, 4, 5-trisphosphate-dependent Ca2+ threshold dynamics detect spike timing in cerebellar Purkinje Cells. Journal of Neuroscience, 25, 950-961 (2005).
Download the lecture slides (pdf) here
Andrew G. Barto - "Searching in the Right Space: Perspectives on Computational Reinforcement Learning"
Computational reinforcement learning (RL) is the study of algorithms that allow systems (software systems, artificial "agents", etc.) to improve their performance over time through trial-and-error experience. Although modern computational RL methods were inspired by animal learning, they did not develop as explicit models of animal behavior. Instead, development was driven by the need to solve, or at least ameliorate, difficult computational problems that make simple trial-and-error learning too inefficient to be of much use in artificial intelligence. Understanding these difficulties and the history of attempts to solve them is important for a full appreciation of modern RL methods. The common view of RL as a collection of methods for approximating solutions to Markovian Decision Processes is relatively recent and, in fact, tends to obscure the root ideas of RL. A historical perspective is also necessary for full appreciation of the surprising correspondences between computational RL methods and brain reward systems, in particular, the correspondence between Temporal Difference methods and the activity of dopamine neurons. In this lecture, I first present an account of the history of computational RL, starting with some of the earliest work in all of artificial intelligence. Then I review the major elements of modern RL and its relationship to optimal control, other types of machine learning, and to neuroscience. I present several striking example applications and describe some of the latest research directed toward scaling up to more complex problems, including recent work on hierarchical and intrinsically motivated RL.
Samuel, A. L. (1959). "Some Studies in Machine Learning Using the Game of Checkers", IBM Journal on Research and Development, vol. 3, pp. 211-229. Reprinted in E. A. Feigenbaum and J. Feldman, editors, Computers and Thought, pp. 71-105, McGraw-Hill, New York, 1963.
Michie, D. and Chambers, R. A. (1968). "BOXES: An Experiment in Adaptive Control". In Dale, E. and Michie, D., Machine Intelligence 2, Oliver and Boyd, Edinburgh, pp. 137-152.
Sutton, R. S. and Barto, A. G. (1981). "Toward a Modern Theory of Adaptive Networks: Expectation and Prediction", Psychological Review, vol. 88, pp. 135-170.
A. G. Barto, A. G., Sutton, R. S., and Anderson, C. W. (1983). "Neuronlike Elements That Can Solve Difficult Learning Control Problems", IEEE Transactions on Systems, Man and Cybernetics, vol. 13, pp. 835-846. Reprinted in J. A. Anderson and E. Rosenfeld (eds.), Neurocomputing: Foundations of Research, pp. 535-549, MIT Press, Cambridge, MA, 1988.
Barto, A.G., Singh, S., and Chentanez, N. (2004).
"Intrinsically Motivated Learning of Hierarchical Collections of Skills"
International Conference on Developmental Learning (ICDL), LaJolla, CA, USA [ pdf ]
Download the lecture slides (ppt) here
Bernard Balleine - "Prediction and control: Pavlovian-instrumental interactions and their neural bases"
Predictions about the occurrence of rewarding events can be based on environmental stimuli or the actions with which those events are associated. These two sources of predictive learning have long been thought to have much in common both in terms of the learning processes that determine their acquisition and the processes that they engage to generate changes in behavior. Recently, however, experimental evidence has emerged to counter these claims. First, a division has been established between predictions based on stimuli and actions with regard to the utility or plasticity of the behavioral responses that they control. Second, it has become clear that the learning processes that contribute to these forms of prediction differ; whereas error correction provides a good first approximation of the formation of predictions based on stimuli, predictions based on actions appear to reflect the relative rates of actions and their consequences across time. Finally, the representations of the rewarding events that are associated with stimuli and with actions appear to differ. Although primary motivational processes directly influence responses controlled by stimuli associated with reward, these processes only affect the performance of actions when combined with emotional feedback. It appears, therefore, that prediction and control exert distinct effects on adaptive behavior. Nevertheless, it has been well documented that these processes interact; reward-related stimuli can exert a powerful excitatory influence on the performance of actions. The behavioral and neural bases of this interaction have been subjected to considerable study and these recent findings as well as their theoretical importance will be considered in detail.
Balleine, B.W. (2001). Incentive processes in instrumental conditioning. In R. Mowrer & S. Klein (Eds) Handbook of Contemporary Learning Theories (pp 307-366). Hillsdale, NJ: LEA.
Dickinson, A. & Balleine, B.W. (2002). The role of learning in motivation. In CR Gallistel (Ed) Learning, Motivation & Emotion, Volume 3 of Steven's Handbook of Experimental Psychology, Third Edition (pp. 497-533). New York: John Wiley & Sons.
Dayan, P. & Balleine, B.W. (2002). Reward, Motivation and Reinforcement Learning. Neuron, 36, 285-298.
Balleine, B.W. (2004). Incentive Behavior. In: I.Q. Whishaw and B. Kolb (Eds). The Behavior of the Laboratory Rat: A Handbook With Tests (Chapter 41; pp. 436-446). Oxford: Oxford University Press.
Peter Dayan - "Uncertainty and Attention in Learning and Inference"
Classical and instrumental conditioning paradigms pose animals elemental (though not always elementary) inference and learning problems. Since uncertainty plays the starring role in probabilistic accounts of inference and learning, it is appealing to study conditioning through the optimal lens that uncertainty provides.
From a behavioural viewpoint, we will consider uncertainty-based accounts of two sophisticated conditioning paradigms which are often described in attentional terms: downwards unblocking, in which uncertainty apparently regulates the competition between multiple predictors of an outcome, and backwards blocking, in which anti-correlations in the uncertainties of two predictors apparently influence the course of learning.
From a neural viewpoint, we will consider the broad, though occasionally shallow, evidence that the neuromodulators acetylcholine (ACh) and norepineprhine (NE) play special roles in reporting distinct forms of uncertainty and thereby influencing aspects of conditioning and other attentional tasks.
John O'Doherty - "Reward-related learning and decision making in the human brain: insights from functional neuroimaging"
It is axiomatic that most animals including humans have a propensity to seek out rewards and avoid punishments. Central to the organization of such behaviour is the ability to represent the value of rewarding and punishing stimuli, establish predictions of when and where such rewards and punishments will occur and use those predictions to form the basis of decisions that guide behaviour.
In this lecture we will describe recent advances in understanding the neural substrates of reward processing in the human brain that have arisen from research in functional neuroimaging. The focus will be on the role of specific brain structures implicated in reward processing and reward-learning on the basis of extensive research in animals and lesion studies in humans, including the ventromedial prefrontal cortex (encompassing orbital and medial prefrontal regions), amygdala, striatum and dopaminergic midbrain. Distinct reward-related functions can be attributed to different components of this network. Orbitofrontal cortex is involved in coding stimulus reward value and in concert with the amygdala and ventral striatum is implicated in representing predicted future reward. Such representations can be used to guide action selection for reward, a process that depends, at least in part, on orbital and medial prefrontal cortex as well as dorsal striatum.
We will also spend some time reviewing the advantages and disadvantages of functional imaging approaches to decision making compared to other techniques. We will discuss how to interpret functional imaging results in the light of what is known about the physiological basis of the signals being measured. Furthermore we will briefly discuss some of the different methodological approaches that can be taken in functional neuroimaging studies of decision making, such as simple event-related trial comparisons, the pplication of computational models to fMRI data, as well as the modeling of connectivity between different brain structures.
O'Doherty JP. Reward representations and reward-related learning in the human brain: insights from neuroimaging. Curr Opin Neurobiol. 2004 Dec;14(6):769-76.
McClure SM, York MK, Montague PR. The neural substrates of reward
processing in humans: the modern role of FMRI. Neuroscientist. 2004
Nathaniel Daw - "Partially observable Markov decision processes and neural models"
Much work in reinforcement learning -- on which is built much work in behavioral and neural modeling -- is based on the Markov decision process formalism. However, most realistic situations (and indeed, the experimental situations addressed by the models) are not well described as an MDP. This is particularly due to the MDP's assumption that the "process state" (all information determining the future behavior of the process) is always transparently observable.
A richer and more realistic model of the interaction between sensory processing and reinforcement learning is available using an extended formalism, the partially observable MDP -- in which the process state is hidden and may be observed only indirectly. We begin by discussing model fitting and latent variable inference for general situations involving hidden state (e.g. hidden Markov models), before introducing the POMDP formalism and some of the major approaches and approximations for reinforcement learning in it. We then consider models of how neural systems might cope with similar problems. We consider hidden state inference problems in two systems -- the dopamine system and its interactions with cortical sensory processing, and neurons in parietal cortex thought to be involved in inference about noisy stimuli in saccade tasks.
A.R. Cassandra (1999). "POMDPs for Dummies: POMDPs and their algorithms, sans formula!" online tutorial at
N.D. Daw, A.C. Courville, and D.S. Touretzky (2005). "Representation and timing in theories of the dopamine system," (under review, Neural Computation)
J. I. Gold and M. N. Shadlen (2002). "Banburismus and the brain: Decoding the relationship between sensory stimuli, decisions, and reward." Neuron, 36:299-308.
Leo Sugrue - "Value based decision making: A Combined behavioral, modeling, and physiological approach"
To forage successfully animals must maintain an internal representation of the value or utility of competing options and link that representation to the neural processes responsible for decision-making and action implementation. In this lecture I will outline a general approach to the electrophysiological study of value-based choice in awake, behaving monkeys. This approach has three key components: first, demonstrating that behavior is under the control of an animal's history of choices and rewards; second, modeling behavioral data to gain insight into the decision variables that specify the animal's choices; and third, analyzing electrophysiological signals to determine if and how these decision variables are encoded within specific neural systems.
I will discuss how we have applied this approach to study choice behavior and related neural activity in rhesus monkeys engaged in a dynamic foraging game. This work demonstrates the efficacy of a simple Linear-Nonlinear-Poisson framework in producing successful predictive and generative models of animal foraging behavior that in turn provide useful candidate decision variables for neurophysiological investigation. To date our physiological experiments have focused on exploring neural signals in each of two areas of cortex that are implicated in reward processing or higher order motor planning. I will discuss the results of single cell recording experiments conducted in these areas while monkeys engaged in this foraging game, and what these results suggest about the respective roles of these areas in value-based decision making.
Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Choosing the greater of two goods: neural currencies for valuation and decision making. Nature Reviews Neuroscience (In Press).
Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782-7 (2004).
Supporting online material:
Dorris, M. C. & Glimcher, P. W. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44, 365-78 (2004).
Gold, J. I., & Shadlen M. N. Neural computations that underlie decisions about sensory stimuli. Trends Cogn. Sci. 5, 10-16 (2001).
Shizgal, P. Neural basis of utility estimation. Current Opin. Neurobiol. 7, 198-208 (1997).
Wolfram Schultz - "Neural processing of reward information"
Part 1 (1/3 of lecture):basic concepts of reward functions derived from animal learning theory and game theory.
Rewards are defined by their influence on behavior and have three, schematically separable functions: learning (positive reinforcement), approach behavior (taking rewards as goals of behavior) and positive emotion (hedonia). Both in Pavlovian and operant conditioning, the learning function requires the pairing of a reward with a conditioned stimulus (contiguity), a higher probability of the reward occurring in the presence as opposed to the absence of a conditioned stimulus (contingency) and the unpredictability of reward (prediction error). The evidence for the role of prediction error in learning stems from the blocking paradigm, according to which learning of a stimulus is blocked when the reward occurs fully predicted. In their second function, rewards elicit approach and consummatory behavior. This is due to the objects being labelled with appetitive value through innate mechanisms or, in most cases, learning. According to game theory, expected reward value is defined as the product of reward probability and magnitude, although this principle is violated in certain cases and is replaced by expected utility or prospect theory. Individuals outside the laboratory are usually faced with situations that involve varying levels of uncertainty about available rewards. Predictions can be considered to reduce the subjective uncertainty by providing advance information about the expected rewards together with their likelihood of occurrence (probability distribution). Taken together, these theories provide a theoretical framework that helps to define the crucial variables that are important for reward-directed behavior. The theories further provide the foundations for decision-making by describing the contribution of reward variables to decisions. More details are found in:
Schultz,W.: Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1-27, 1998
Schultz, W.: Getting formal with dopamine and reward. Neuron 36: 241-263, 2002
Schultz W. Neural coding oof basic reward terms of animal learning theory, microeconomics and behavioural ecology. Curr. Op. Neurobiol. 14: 139-147, 2004.
Part 2 (2/3 of lecture): neural processing of reward information.
Information about rewards is processed in a number of brain structures, such as the dopamine system, striatum and orbitofrontal cortex. We found in formal learning paradigms that dopamine neurons detect the extent to which rewards occur differently than predicted, thus coding an 'error' in the prediction of reward. Together with their anatomical organization and influence on postsynaptic structures, dopamine neurons may thus serve as explicit teaching signals mediating changes in immediate responses and learning. Neurons in the orbitofrontal cortex discriminate well between different rewards irrespective of the positions and objects of the stimuli predicting them and may serve as a highly sensitive reward-discriminating system. Neurons in the striatum incorporate reward information into activity related to the preparation and execution of movements leading to the reward, thus possibly reflecting a neural mechanism underlying goal-directed behavior. We then investigated how reward neurons in these structures might code basic, game theoretic decision variables and deal with uncertainty. They are sensitive to variables such as magnitude and probability of reward, as well as their combination (expected reward value: probability x magnitude of reward). Neurons in all mentioned reward stuctures adapted their coding range and input-output gain to the uncertainty of rewards determined by predictions. The coding of prediction errors by dopamine neurons, as assessed in formal learning paradigms, further improved the coding of reward information. In addition, we found that a slower response in dopamine neurons signalled explicitly the uncertainty of reward, being maximal at a probability of 0.5. These results suggest common neural mechanisms underlying the efficient coding of basic variables of learning theory and game theory. More details are found in:
Tremblay, L. and Schultz, W.: Relative reward preference in primate orbitofrontal cortex. Nature 398: 704-708, 1999
Schultz, W.: Multiple reward systems in the brain. Nature Rev. Neurosci. 1: 199-207, 2000
Waelti, P., Dickinson, A. and Schultz, W.: Dopamine responses comply with basic assumptions of formal learning theory. Nature 412: 43-48, 2001
Fiorillo, C.D., Tobler, P.N. and Schultz, W.: Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898-1902, 2003.
Minoru Kimura - "Neural basis of response bias and its complementary process in the basal ganglia"
Animals adaptively choose actions that are expected to yield a larger amount and probability of reward. The basal ganglia are one of key brain structures for such reward-based decision and action selection. Neurons in the striatum discriminate rewarding and aversive contexts of environmental stimuli and behavioral actions, implicating a role for bias of decision and action. Midbrain dopamine neurons transmit signals of reward expectation errors and incentive to work for reward. On the other hand, the traditional response bias models may be fundamentally incomplete as the basis of goal-directed action mechanisms, because response bias is in many occasions aborted when events do not occur as expected. Thus, an additional component, which plays complementary roles to response bias, seems to be required. We found that neurons in the monkey CM/PF thalamus respond to multimodal stimuli in which the magnitude of a response is larger when the stimulus is unpredictable and contrary to expectation. The CM/PF complex has strong connections to and from the basal ganglia as well as frontal cerebral cortex and brain stem.
Our hypothesis is that the cortico-basal ganglia system plays a major part in both response bias and its complimentary process in which the striatum receives cortical signals for action and cognition, dopamine signals for response bias and signals from CM/PF thalamus for the complimentary process to response bias.
Matsumoto N, Minamimoto T, Graybiel AM, Kimura M: Neurons in the thalamic CM/Pf complex supply neurons in the striatum with information about behaviorally significant events. J Neurophysiol 85: 960-976, 2001.
Yamada, H., Matsumoto, N. and Kimura, M. Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action. J Neurosci 24:3500-3510, 2004.
Daeyeol Lee - "Cortical mechanisms of reinforcement learning and decision making"
Flexible mapping from a sensory stimulus to a particular action is a hallmark of intelligent behaviors, and this in turns implies that decision making process by which one of often many alternative actions is chosen must be dynamically adjusted through experience. In order to investigate the cortical mechanisms responsible for evaluating the outcome of the animal's choice and optimizing the animal's decision making strategies, we employed a behavioral task modeled after a two-player zero-sum game, known as the matching pennies. By manipulating the exploitative nature of the computer opponent's strategy, we demonstrated that the animal's decision making strategy can be systematically influenced by the strategy of the opponent. In addition, single-unit recordings during this dynamic decision making task showed that many neurons in the lateral and medial prefrontal cortex displayed modulations in their activity related to the past history of the animal's choices and their outcomes. These results suggest that the primate prefrontal cortex plays a key role in optimizing the animal's behavioral strategy in a complex dynamic environment.
Barraclough DJ, Conroy ML, Lee D (2004) Prefrontal cortex and decision making in a mixed-strategy game. Nature Neurosci. 7: 404-410.
Camerer CF (2003) Behavioral game theory: experiments in strategic interaction, Princeton Univ. Press, Princeton.
Lee D, Conroy ML, McGreevy BP, Barraclough DJ (2004) Reinforcement learning and decision making in monkeys during a competitive game. Cogn. Brain Res. 22: 45-58.
Luce RD and Raiffa H (1957) Games and Decisions. John Wiley and Sons, New York.
Sutton RS and Barto AG (1998) Reinforcement learning: an introduction, MIT Press, Cambridge.
Jun Tanji - "Participation of the lateral prefrontal cortex in behavioral planning"
Behavioral planning is an important part of executive function that is viewed as a primary function of the lateral prefrontal cortex (l-PFC). Previous reports have often suggested the involvement of l-PFC in planning of future movements. In this lecture, I will present lines of evidence showing that this view is too simplistic and incorrect. The l-PFC plays a part in planning behavioral consequence or goal, rather than planning movements. Moreover, the l-PFC is involved in a broad range of cognitive processes required for achieving behavioral planning. My lecture will have three parts: 1. Coding of behavioral outcome or behavioral goal by the l-PFC. 2. Participation of the l-PFC for planning spatial sequence based on memory. 3. Categorization of the temporal structure of behavioral sequence by cells in the l-PFC.
1. Tanji, J. and Hoshi, E. Behavioral planning in the prefrontal cortex. Curr Opin Neurobiol. 11(2):164-70, (2001)
2. Miller, EK, and Cohen, JD. An integrative theory of prefrontal cortex function. Annu Rev Neurosci. 24:167-202, (2001)
3. Ninokura, Y., Mushiake, H., and Tanji J. Integration of temporal order and object information in the monkey lateral prefrontal cortex. J Neurophysiol. 91(1):555-60, (2004)
4. Saito, N., Mushiake, H., Sakamoto, K., Itoyama, Y., and Tanji J. Representation of Immediate and Final Behavioral Goals in the Monkey Prefrontal Cortex during an Instructed Delay Period. Cereb Cortex. 2005 Feb 9; [Epub ahead of print]
Anitha Pasupathy - "Neural correlates of associative learning in the prefrontal cortex and basal ganglia"
Complex behavior is possible because the primate brain is adept at learning new and arbitrary associations such as "red means stop". Evidence based on anatomical, neurophysiological and lesion studies suggest that such associative learning depends on neural mechanisms in several brain regions including the prefrontal cortex (PFC) and subcortical nuclei of the basal ganglia (BG). Neural correlates of associative learning have been identified in both areas, but the mechanisms in these areas that underlie learning are still unclear. To gain insights into the relative roles of these areas, we studied the neural activity in the PFC and the striatum (an input structure of the BG) simultaneously, while monkeys learned the associations between two novel visual objects and two saccades. In this lecture, I'll demonstrate that during such learning, neural activity selective for saccade direction evolved at different rates in the two areas: the striatum showed rapid, almost bistable, changes compared to a slower trend in the PFC that was more in line with slow improvements in behavioral performance. While saccade selectivity in the striatum increased faster and reached a higher peak as compared to the PFC, decoding of behavior based on firing from the two areas was equally accurate. Single cells in the striatum became strongly tuned faster, but their firing reflected the behavioral response direction, even when it was wrong -i.e. different than the instructed direction. Because the striatum generated this activity earlier in learning (when there are more errors) than the PFC, these results suggest that the striatum generates quick "predictions" about the behavioral choice and the PFC reflects the slower accumulation of the correct answer. Finally, I'll discuss how these results fit with the various models proposed for the roles of PFC and BG in learning.
A. Pasupathy, E. K. Miller., 2005. Different timecourses of learning-related activity in the prefrontal cortex and striatum., Nature. 433: 873-876 and supplementary materials on the web.
(This describes the first many of the results that will be discussed in the lecture.)
Wise, S. P., Murray, E. A. & Gerfen, C. R. The frontal cortex-basal ganglia system in primates. Crit Rev Neurobiol 10, 317-56 (1996).
(This provides an overview of what's known about prefrontal cortex and basal ganglia based on anatomy, physiology and lesion studies.)
Hikosaka, O. Neural systems for control of voluntary action--a hypothesis. Adv Biophys. 1998;35:81-102. Review.
(This proposes some hypotheses about control of behavior by the basal ganglia.)
Houk, J. C. & Wise, S. P. Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb Cortex 5, 95-110 (1995).
Masamichi Sakagami - "Predictions and Decisions by Single Neurons in Monkey Prefrontal Cortex"
Intelligence shows in the way animals create new information based on learned information. In the first half of talk, I will show how neurons in the prefrontal cortex, particularly the lateral prefrontal cortex (LPFC), code learned information (behavioral and associative meaning of the stimulus) and how the information is organized. In the latter half, we will discuss the functional meaning of the codes. Here, I will introduce our new experiment, where monkey subjects were asked to anticipate a reward by combining 2 different pieces of information that were independently acquired. Behaviorally confirming their ability to anticipate a reward in this situation, we recorded from neurons in the LPFC. The results suggest that the LPFC is ierarchically organized to generate the reward-predictive information based on independent projections of sensory and reward information, and the process of combining the information might lead to the generation of new information.
Encoding of behavioral significance of visual stimuli by primate prefrontal neurons: relation to relevant task conditions. Exp. Brain Res., 97: 423-436, 1994
The hierarchical organization of decision making in the primate prefrontal cortex. Neurosci. Res. 34: 79-89, 1999
Response to task-irrelevant visual features by primate prefrontalneurons. J. Neurophysiol. 86: 2001-2010, 2001
Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J. Neurophysiol. 87: 1488-1498, 2002
Most of the evening hours will be spent for student projects. Below
are the three groups and tutors in charge of them:
A) Basic Computating
Motoaki Kawanabe (Fraunhofer FIRST)
Jun Morimoto (ATR)
Jun-ichiro Yoshimoto (OIST)
Students: (in presentation orders)
Mehdi Khamassi "Hidden Markov Models: Applied to Gambling Task"
Jun Nishikawa "A Simple Behavioral Experiment based on Hidden Morkov Model"
Theresa Feledy "Theresa meets Markov"
Nicholas Cothros "Inverse Dynamics"
Angela Coderre "Feed-back Control and Noise Compensation"
Nobuhiro Hagura "Inverse Dynamics"
Yusuke Ikemoto "Evaluations of Feedback Error Learning with Time Delay"
Eleni Vasilaki "Adaptive Feedback Control for a Two-link Robotic Arm"
Biswa Sengupta "Dimensional Reduction & data mining using Component Analysis of biomedical data"
Colin Barringer, Tetsuro Marimura "Multipul Model-Based Reinforcement Learning"
Ahmed "Reinforcement Learning and Complex Control"
Vincent Huang "The B-Team: Modeling"
Takahiro Doi "Nonlinear Effects in Reward and Choice History to Monkey Valuation"
Michael Pfeiffer "Combined Reward - and Choice-based Models of Primate Choice"
Rachel Kalmar "Exploration vs. Exploitation in a Foraging Task"
Bob Schafer "Dynamics of the Inspection Game: Behavior and Modeling"
Leszek Rybicki "TD Biots"
B) Computational Modeling
Greg Corrado (Stanford U.)
Aaron Gruber (Albany Medical C.)
Yael Niv (Gatsby and Hebrew U.)
Hirokazu Tanaka (Salk Institute)
Daniela Schiller "TD model of Latent Inhibition?"
Ricardo Chavarriaga, Kerstin Preuschoff, Tiago Maia "Kalman filters and conditioning"
Jadin Jackson, Yohei Yamada "Hidden Markov Models: Application to Behavioral Data"
Bart Baddeley "Comparing Model Based and Model Free Reinforcement Learning Techniques"
Ben Seymour "Pain: inference and uncertainty"
Stephan Cowen "Attractor Models of mPFC Ramping Responses"
Lingyun Zhang "Dopamine's Effect on Single Neuron"
C) Behavioral Experiment
Jean-Claude Dreher (CNRS)
Makoto Ito (OIST)
Kazuyuki Samejima (ATR)
Hiroshi Yamada "How do we predict changes of environment?"
Jonathan Nelson "Randomness and Local Statistical Regularities"
Wako Yoshida "Decision Making in Partially-observable Environment - Tigar Problem -"
Dmitri Bibitchkov "The Wonderful Adventures of Honeybees at Okinawa: Context dependence of rational choice in a behavioral experiment"
- Peter Dayan, Gatsby Computational Neuroscience Unit
- Kenji Doya Initial Research Project, OIST
- Masamichi Sakagami, Tamagawa University
- Bernard Balleine, UCLA
- Andrew G. Barto, University of Massachusetts
- Nathaniel Daw, Gatsby Computational Neuroscience Unit, UCL
- Peter Dayan, Gatsby Computational Neuroscience Unit, UCL
- Kenji Doya, Initial Research Project, OIST
- Mitsuo Kawato, ATR, Computational Neuroscience Laboratories
- Minoru Kimura, Kyoto Prefectural University of Medicine
- Daeyeol Lee, University of Rochester
- John O'Doherty, California Institute of Technology
- Masamichi Sakagami, Tamagawa Universitty
- Anitha Pasupathy, MIT
- Stefan Schaal ,University of Southern California
- Wolfram Schultz, University of Cambridge
- Reza Shadmehr, Johns Hopkins University
- Leo Sugrue, Stanford University
- Jun Tanji, Tamagawa University
- Greg Corrado, Max Planck Institute
- Jean-Claude Dreher, CNRS, Institut des Sciences Cognitives
- Aaron Gruber, Albany Medical College
- Makoto Ito, Initial Research Project, OIST
- Motoaki Kawanabe, FraunhoferFIRST.IDA
- Jun Moroimoto, ATR, Computational Neuroscience Laboratories
- Yael Niv, Gatsby Computational Neuroscience Unit, UCL and Interdisciplinary Center for Neural Computation, Hebrew University
- Kazuyuki Samejima, ATR, Computational Neuroscience Lab.
- Hirokazu Tanaka, Salk Institute
- Junichiro Yoshimoto, Initial Research Project, OIST
- Ahmed, University of Hyderabad
- Bart Baddeley, University of Sussex
- Colin Barringer, University of Massachusetts
- Dmitri Bibitchkov, Weizmann Institute of Science
- Ricardo Chavarriaga, Ecole Polytechnique Federale de Lausanne
- Angela Coderre, Queen's University
- Nicholas Cothros, University of Western Ontario
- Stephen Cowen, University of Arizona
- Takahiro Doi, Osaka university
- Theresa Feledy, MIT
- Nobuhiro Hagura, Kyoto University
- Vincent Huang, Johns Hopkins University
- Jadin Jackson, University of Minnesota
- Yusuke Ikemoto, Nagoya University
- Rachel Kalmar, Stanford University
- Mehdi Khamassi, College de France / Universite de Paris 6
- Tiago Maia, Carnegie Mellon University
- Teppei Matsui, Unversity of Tokyo
- Billy Muhando, Unversity of the Ryukyus
- Rama Natarajan, University of Toronto
- Jonathan Nelson, UC, San Diego
- Jun Nishikawa, RIKEN Brain Science Institute
- Michael Pfeiffer, Graz University of Technology
- Kerstin Preuschoff, California Institute of Technology
- Leszek Rybicki, Nicolaus Copernicus University
- Bob Schafer, Stanford University
- Daniela Schiller, New York University
- Biswa Sengupta, University of York
- Ben Seymour, Wellcome Dept. of Imaging Neuroscience, UCL
- Saori Tanaka, NAIST
- Eleni Vasilaki, University of Bern
- Hiroshi Yamada, Kyoto Prefectural University of Medicine
- Yohei Yamada, University of Tokyo
- Wako Yoshida, NAIST
- Lingyun Zhang, UC, San Diego