List of Publications

Journal Articles

In Press

Doya K, Taniguchi T (2019). Toward evolutionary and developmental intelligence. Current Opinion in Behavioral Sciences, 29, 91-96. [preprint]

Doya K, Matsuo Y (2019). Artificial intelligence and brain science: the present and the future. Brain and Nerve (in Japanese) [preprint]


Elfwing S, Uchibe E, Doya K (2018). Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks.

Kazumi K, Hideki H, Yoshihiko F, Charles SD, Manabu H, with a sensorimotor rhythm-based brain-computer interface in a Parkinson’s disease patient. Brain-Computer Interfaces.

Magrans de Abril I, Yoshimoto J, Doya K (2018). Connectivity inference from neural recording data: Challenges, mathematical bases and research directions. Neural Networks.

Miyazaki K, Miyazaki KW, Yamanaka A, Tokuda T, Tanaka KF, Doya K (2018). Reward probability and timing uncertainty alter the effect of dorsal raphe serotonin neurons on patience. Nat Commun, 9, 2048.

Tokuda T, Yoshimoto J, Shimizu Y, Okada G, Takamura M, Okamoto Y, Yamawaki S, Doya K (2018). Identification of depression subtypes and relevant brain regions using a data-driven approach. Sci Rep, 8, 14082.

Yoshizawa T, Ito M, Doya K (2018). Reward-predictive neural activities in striatal striosome compartments. eNeuro, 5.


Shouno O, Tachibana Y, Nambu A, Doya K (2017). Computational model of recurrent subthalamo-pallidal circuit for generation of parkinsonian oscillations. Frontiers in Neuroanatomy, 11, 1-15.

Tokuda T, Yoshimoto J, Shimizu Y, Okada G, Takamura M, Okamoto Y, Yamawaki S, Doya K (2017). Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions. PLoS ONE, 12, e0186566.

Uchibe E (2017). Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters.

Wang JX, Uchibe E, Doya K (2017). Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer. Front Neurorobot, 11, 1-15.

Yoshida K, Shimizu Y, Yoshimoto J, Takumura M, Okada G, Okamoto Y, Yamawaki S, Doya K (2017). Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression. PLoS ONE.

Yoshida K, Yoshimoto J, Doya K (2017). Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data. BMC bioinformatics, 18, 1-11.


Caligiore D, Pezzulo G, Baldassarre G, Bostan AC, Strick

Elfwing S, Uchibe E, Doya K (2016). From free energy to value function approximation in reinforcement learning. Neural Networks, 84, 17-27.

Fermin ASR, Yoshida T, Yoshimoto J, Ito M, Tanaka SC, Doya K (2016). Model-based action planning involves cortico-cerebellar and basal ganglia networks. Nature Scientific Reports, 6, 1-14.

Funamizu A, Kuhn B, Doya K (2016). Neural substrate of dynamic Bayesian inference in the cerebral cortex. Nature Neuroscience, 1-12.

Nagai T, Nakamuta S, Kuroda K, Nakauchi S, Nishioka T, Takano T, Zhang X, Tsuboi D, Funahashi Y, Nakano T, Yoshimoto J, Kobayashi K, Uchigashima M, Watanabe M, Miura M, Nishi A, Kobayashi K, Yamada K, Amano M, Kaibuchi K (2016). Phosphoproteomics of the dopamine pathway enables discovery of rap1 activation as a reward signal in vivo. Neuron, 89, 550-65.

Okamoto Y, Okada G, Tanaka S, Miyazaki K, Miyazaki K, Doya K, Yamawaki S (2016). The role of serotonin in waiting for future rewards in depression. International Journal of Neuropsychopharmacology, 19, 33-33.

Shimizu Y, Doya K, Okada G, Okamoto Y, Takamura M, Yamawaki S, Yoshimoto J (2016). Depression severity and related characteristics correlate significantly with activation in brain areas selected through machine learning. International Journal of Neuropsychopharmacology, 19, 135-136.

Uchibe E (2016). Forward and inverse reinforcement learning by linearly solvable Markov decision

Wang J, Uchibe E, Doya K (2016). EM-based policy hyper131.


Balleine B, Dezfouli A, Ito M, Doya K (2015). Hierarchical control of goal-directed action in the cortical–basal ganglia network. Science Direct, 5, 1-7.

Elfwing S, Uchibe E, Doya K (2015). Expected energy-based restricted Boltzmann machine for classification. Neural Networks, 64, 29-38.

Funamizu A, Ito M, Doya K, Kanzaki R, Takahashi H (2015). Condition interference in rats performing a choice task with switched variable- and fixed-reward conditions. Fronteirs in Neuroscience, 9. eCollection 2015.

Hahne J, Helias M, Kunkel S, Igarashi J, Bolten M, Frommer A, Diesmann M (2015). A unified framework for spiking and gap-junction interactions in distributed neuronal network simulations. Frontiers in Neuroinformatics, 9.

Ito M, Doya K (2015). Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum. PLoS ONE.

Ito M, Doya K (2015). Distinct neural representation in the dorsolateral, dorsomedial, and ventral parts

Nakano T, Otsuka M, Yoshimoto J, Doya K (2015). A spiking neural network model of model-free

Shimizu Y, Yoshimoto J, Toki S, Takamura M, Yoshimura S,Toward probabilistic diagnosis and understanding of depression based on functional MRI data analysis with logistic group LASSO. PLoS ONE, 10, e0123524.


Elfwing S, Doya K (2014). Emergence of polymorphic mating strategies in robot colonies. PLoS ONE, 9, e93622.

Kunkel S, Schmidt M, Eppler MJ, Plesser HE, Masumoto G, Igarash J, Ishii S, Fukai T, Morrison A, Diesmann M, Moritz H (2014). Spiking network simulation code for petascale computers. Frontiers in Neuroinfomatics, 8.

Miyzaki WK, Miyazaki K, Tanaka FK, Yamanaka A, Takahashi A, Tabuchi S, Doya K (2014). Optogenetic Activation of Dorsal Raphe Serotonin Neurons Enhances Patience for Future Rewards. Current Biology.

Obrochta SP, Yokoyama Y, Moren J, Crowley TJ (2014). Conversion of GISP2-based sediment core age models to the GICC05 extended chronology. Quaternary Geochronology, 20, 1-7.


Elfwing S, Uchibe E, Doya K (2013). Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces. Front Neurorobot, 7, 3.

Funamizu A, Kanzaki R, Takahashi H (2013). Pre-attentive, context-specific representation of fear

Kinjo K, Uchibe E, Doya K (2013). Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task. Front Neurorobot, 7, 7.

Moren J, Shibata T, Doya K (2013). The mechanism of saccade motor pattern generation investigated by a large-scale spiking neuron model of the superior colliculus. PLoS ONE, 8, e57134.

Nakano T, Yoshimoto J, Doya K (2013). A model-based prediction of the calcium responses in the striatal synaptic spines depending on the timing of cortical and dopaminergic inputs and post-synaptic spikes. Frontiers in Computational Neuroscience, 7, 119.

Yoshimoto J, Ito M, Doya K (2013). Recent progress in reinforcement learning: Decision making in the brain and reinforcement learning. Journal of the Society of Instrument and Control Engineers, 52, 749-754.


Demoto Y, Okada G, Okamoto Y, Kunisato Y, Aoyama S, Onoda K, Munakata A, Nomura M, Tanaka SC, Schweighofer N, Doya K, Yamawaki S (2012). Neural and personality correlates of individual differences related to the effects of acute tryptophan depletion on future reward evaluation. Neuropsychobiology, 65, 55-64.

Funamizu A, Ito M, Doya K, Kanzaki R, Takahashi H (2012).Neuroscience, 35, 1180-1189.

Miyazaki KW, Miyazaki K, Doya K (2012). Activation of dorsal raphe serotonin neurons is necessary for waiting for delayed rewards. Journal of Neuroscience, 32, 10451-10457.

Sugimoto N, Haruno M, Doya K, Kawato M (2012). MOSAIC for Multiple-Reward Environments. Neural Computation, 24, 577-606.


Elfwing S, Uchibe, E., Doya, K., Christensen, HI (2011). Darwinian embodied evolution of the learning ability for survival. Adaptive Behavior.

Ito M, Doya K (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Current Opinion in Neurobiology, 21.

Miyazaki K, Miyazaki KW, Doya K (2011). Activation of dorsal raphe serotonin neurons underlies waiting for delayed rewards. Journal of Neuroscience, 31, 469-479.

Miyazaki KW, Miyazaki K, Doya K (2011). Activation of the central serotonergic system in response to delayed but not omitted rewards. European Journal of Neuroscience, 33, 153-160.

Pammi VSC, Miyapuram KP, Ahmed, Samejima K, Bapi RS, Doya K (2011). Changing the structure of

Yoshimoto J, Sato M-A, Ishii S (2011). Bayesian normalized Gaussian network and hierarchical model selection method. Intelligent Automation and Soft Computing, 17, 71-94.


Fermin A, Yoshida T, Ito M, Yoshimoto J (2010). Evidence for Model-Based Action Planning in a Sequential Finger Movement Task. Journal of Motor Behavior, 42, 371-379.

Klein M, Kamp H, Palm G, Doya K (2010). A computational neural model of goal-directed utterance selection. Neural Networks.

Morimura T, Uchibe E, Yoshimoto J, Peters J, Doya K (2010). Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning. Neural Computation, 22, 342-376.

Nakano T, Doi T, Yoshimoto J, Doya K (2010). A kinetic model of dopamine and calcium dependent striatal synaptic plasticity. PLoS Computational Biology, 6, e1000670.


Fujiwara Y, Yamashita O, Kawawaki D, Doya K, Kawato M, Toyama K, Sato M-a (2009). A hierarchical Bayesian method to resolve an inverse problem of MEG contaminated with eye movement artifacts. NeuroImage, 45, 393-409.

Ito M, Doya K (2009). Validation of Decision-Making Models and Analysis of Decision Variables in the

Ito M, Shirao T, Doya K, Sekino Y (2009). supramammillary nucleus of the rat exposed to novel environment. Neuroscience Research, 64, 397-402.

Otsuka M, Yoshimoto J, Doya K (2009). Reward-dependent sensory coding in free-energy-based reinforcement learning. Neural Network World., 19, 597-610.

Tanaka SC, Shishida K, Schweighofer N, Okamoto Y, Yamawaki S, Doya K (2009). Serotonin affects association of aversive outcomes to past actions. Journal of Neuroscience, 16, 15669-74. 10.1523/JNEUROSCI.2799-09.2009


Elfwing S, Uchibe E, Doya K, I. CH (2008). Co-evolution of shaping rewards and meta-parameters in reinforcement learning. Adaptive Behavior, 16, 400-412.

Morimura T, Uchibe E, Yoshimoto J, Doya K (2008). A new natural gradient of average reward for policy search. IEICE Transactions, J91-D, 1515-1527.

Sato T, Uchibe E, Doya K (2008). Emergence of communication and cooperative behavior by

Schweighofer N, Bertin M, Shishida K, Okamoto Y, Tanaka S, Yamawaki S, Doya K (2008). Low-serotonin levels increase delayed reward discounting in humans. Journal of Neuroscience, 28, 4528-4532 (Erratum in 28, 5619).


Bertin M, Schweighofer N, Doya K (2007). Multiple model-based reinforcement learning explains dopamine neuronal activity. Neural Netw, 20, 668-75.

Corrado G, Doya K (2007). Understanding neural coding through the model-based analysis of decision making. Journal of Neuroscience, 27, 8178-8180.

Doya K (2007). Reinforcement learning: Computational theory and biological mechanisms. HFSP Journal, 10.2976/1.2732246.

Elfwing S, Doya K, Christensen HI (2007). Evolutionary development of hierarchical learning structures. IEEE Transactions on Evolutionary Computations, 11, 249-264.

Kamioka T, Uchibe E, Doya K (2007). Max-Min Actor-Critic for Multiple Reward Reinforcement Learning. IEICE TRANSACTIONS on Information and Systems, J90-D, 2510-2521.

Morimoto J, Doya K (2007). Reinforcement learning state estimator. Neural Computation, 19, 730-756.

Ogasawara H, Doi T, Doya K, Kawato M (2007). Nitric oxide regulates input specificity of long-term depression and context dependence of cerebellar learning. PLoS Computational Biology, 3, e179.

Samejima K, Doya K (2007). Multiple representations of belief states and action values in corticobasal ganglia loops. Annals of New York Academy of Sciences, 1104, 213-228.

Schweighofer N, Tanaka SC, Doya K (2007). Serotonin and the evaluation of future rewards: Theory, experiments, and possible neural mechanisms. Annals of New York Academy of Sciences, 1104, 289-300.

Tanaka SC, Samejima K, Okada G, Ueda K, Okamoto Y, Yamawaki S, Doya K (2007). Brain mechanism of reward prediction under predictable and unpredictable environmental dynamics. Neural Networks, 19, 1233-1241.

Tanaka SC, Schweighofer N, Asahi S, Shishida K, Okamoto Y, Yamawaki S, Doya K (2007). Serotonin differentially regulates short- and long-term prediction of rewards in the ventral and dorsal striatum. PLoS ONE, 2, e1333.


Bando T, Shibata T, Doya K, Ishii S (2006). Switching particle filters for efficient visual tracking. Robotics and Autonomous Systems, 54, 873-884.

Bapi RS, Miyapuram KP, Graydon FX, Doya K (2006). fMRI investigation of cortical and subcortical networks in the learning of abstract and effector-specific representations of motor sequences. NeuroImage, 32, 714-727.

Daw ND, Doya K (2006). The computational neurobiology of learning and reward. Current Opinion in Neurobiology, 16, 199-204.

Hirayama J, Yoshimoto J, Ishii S (2006). Balancing plasticity and stability of on-line learning based on

Kawawaki D, Shibata T, Goda N, Doya K, Kawato M (2006). Anterior and superior lateral occipito-temporal cortex responsible for target motion prediction during overt and covert visual pursuit. Neuroscience Research, 54, 112-123.

Matsubara T, Morimoto J, Nakanishi J, Sato MA, Doya K (2006). Learning CPG-based biped locomotion with a policy gradient method. Robotics and Autonomous Systems, 54, 911-920.

Schweighofer N, Shishida K, Han CE, Okamoto Y, Tanaka SC, Yamawaki S, Doya K (2006). Humans can adopt optimal discounting strategy under real-time constraints. PLoS Computational Biology, 2, e152.

Sugimoto N, Samejima K, Doya K, Kawato M (2006). Hierarchical reinforcement learning: Temporal abstraction based on MOSAIC model. Transactions of Institute of Electronics, Information and Communication Engineers, J89-D, 1577-1587.

Uchibe E, Asada M (2006). Incremental co-evolution with competitive and cooperative tasks in a multi-robot environment. Proceedings of the IEEE.


Capi G, Doya K (2005). Evolution of neural architecture fitting environmental dynamics. Adaptive Behavior, 13, 53-66.

Capi G, Doya K (2005). Evolution of recurrent neural controllers using an extended parallel genetic algorithm. Robotics and Autonomous Systems, 52, 148-159.

Doya K, Uchibe E (2005). The Cyber Rodent project: Exploration of adaptive mechanisms for self-preservation and self-reproduction. Adaptive Behavior, 13, 149-160.

Morimoto J, Doya K (2005). Robust reinforcement learning. Neural Computation, 17, 335-359.

Nishimura M, Yoshimoto J, Tokita Y, Nakamura Y, Ishii S (2005). Control of real acrobot by learning the switching rule of multiple controllers. IEICE Transactions on Fundamentals.

Samejima K, Ueda Y, Doya K, Kimura M (2005). Representation of action-specific reward values in the striatum. Science, 310, 1337-1340.

Yoshimoto J, Doya K, Ishii S (2005). Fundamental theory and application of reinforcement learning. Keisoku to Seigyo: Journal of the Society of Instrument and Control Engineers, 44, 313-318.

Yoshimoto J, Nishimura M, Tokita Y, Ishii S (2005). Acrobot control by learning the switching of multiple controllers. Journal of Artificial Life and Robotics.

Yukinawa N, Yoshimoto J, Oba S, Ishii S (2005). System identification of gene expression time-series based on a linear dynamical system model with variational Bayesian estimation. IPSJ Transactions on Mathematical Modeling and its Applications, 46SIG10, 57-65.


Haruno M, Kuroda T, Doya K, Toyama K, Kimura M, Samejima K, Imamizu H, Kawato M (2004). A

Hirayama J, Yoshimoto J, Ishii S (2004). Bayesianacetylcholine. Neural Networks, 17, 1391-1400.

Miyamoto H, Morimoto J, Doya K, Kawato M (2004). Reinforcement learning with via-point representation. Neural Networks, 17, 299-305.

Sato M, Yoshioka T, Kajiwara S, Toyama K, Goda N, Doya K, Kawato M (2004). Hierarchical Bayesian estimation for MEG inverse problem. NeuroImage, 23, 806-826.

Sugimoto N, Samejima K, Doya K, Kawato M (2004). Reinforcement learning and goal estimation by multiple forward and reward models. Transactions of Institute of Electronic, Information and Communication Engineers, J87-D-II, 683-694.

Tanaka S, Doya K, Okada G, Ueda K, Okamoto Y, Yamawaki S (2004). Prediction of immediate and future rewards differentially recruits cortico-basal ganglia loops. Nature Neuroscience, 7, 887-893.

Uchibe E, Doya K (2004). Hierarchical reinforcement learning for multiple reward functions. Journal of Robotics Society of Japan, 22, 120-129.

Peer-Reviewed Conference Papers [past 5 years]

Kozuno T, Uchibe E, Doya K (2019). Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning. 22nd International Conference on Artificial Intelligence and Statistics (AISTATS2019). Loisir Hotel Naha, Naha city, Okinawa, Japan.

Parmas P, Rasmussen CE, Peters J, Doya K (2018). PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos. Thirty-fifth International Conference on Machine Learning (ICML2018). Stockholm, Sweden.

Parmas P (2018). Total stochastic gradient algorithms and applications in reinforcement learning. Thirty-second Conference on Neural Information Processing Systems (NeurIPS2018). Montreal, Canada.

Uchibe E (2017). Model-free deep inverse reinforcement learning by logistic regression. 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2017). University of Michigan, Ann Arbor, Michigan, USA.

Reinke C, Uchibe E, Doya K (2017). Average Reward Optimization with Multiple Discounting Reinforcement Learners. The 24th International Conference on Neural Information Processing (ICONIP 2017). Guangzhou, China. (Lecture Notes in Computer Science, 10634).

Reinke C, Uchibe E, Doya K (2017). Fast Adaptation of Behavior to Changing Goals with a Gamma Ensemble. 3rd Multidisciplinary conference on reinforcement learning and decision making (RLDM2017). University of Michigan, Ann Arbor, Michigan, USA.

Huang Q, Uchibe E, Doya K (2016). Emergence of communication among reinforcement learning agents under coordination environment. IEEE ICDL-EPIROB 2016. Cergy-Pontoise, Paris, France.

Yukinawa N, Doya K, Yoshimoto J (2015). A Kinetic Signal Transduction Model for Structural Plasticity of Striatal Medium Spiny Neurons. 3rd Annual Winter q-bio Meeting, Maui, Hawaii, USA.

Uchibe E, Doya K (2015). Inverse Reinforcement Learning with Density Ratio Estimation. The Multi-disciplinary Conference on Reinforcement Learning and Decision Making 2015 (RLDM2015), University of Alberta, Edmonton, Canada.

Tokuda T, Yoshimoto J, Shimizu Y, Doya K (2015). Multiple clustering based on co-clustering views. IJCNN 2015 workshop on Advances in Learning from/with Multiple Learners, Killarney, Ireland.

Uchibe E, Doya K (2014). Combining learned controllers to achieve new goals based on linearly solvable MDPs. IEEE International Conference on Robotics and Automation (ICRA2014), Hong Kong.

PhD Theses

Hamada H (2019). Serotonergic control of brain-wide dynamics. PhD Thesis, Okinawa Institute of Science and Technology Graduate University.

Reinke C (2018). The gamma-ensemble: adaptive reinforcement learning via modular discounting. PhD Thesis, Okinawa Institute of Science and Technology Graduate University.

Schulze JV (2018). Spatial and modular regularization in effective connectivity inference from neural activity data. PhD thesis, Okinawa Institute of Science and Technology Graduate University.

Books and Book Chapters

Moren J, Igarashi J, Shouno O, Yoshimoto J, Doya K (2019). Dynamics of basal ganglia and thalamus in Parkinsonian tremor. Cutsuridis V (ed) Multiscale Models of Brain Disorders. Springer.

銅谷賢治 監訳 (2019). ディープラーニング革命. ニュートンプレス. (Supervisory translation of The Deep Learning Revolution by Terrence J. Sejnowski. MIT Press, 2018)

Doya K, Kimura M (2013). The basal ganglia, reinforcement learning, and the encoding of value. In Glimcher PW, Camerer CF, Fehr E (eds.) Neuroeconomics, Second Edition: Decision Making and the Brain, 321-333. Academic Press, London.

Uchibe E, Doya K (2011). Evolution of rewards and learning mechanisms in cyber rodents. In Krichmar JL, Wagatsuma H (eds.) Neuromorphic and Brain-Based Robots. Cambridge University Press.

Doya K, Kimura M (2009). The basal ganglia and the encoding of value. In Glimcher PW, Camerer CF, Fehr E, Poldrack RA (eds.) Neuroeconomics: Decision Making and the Brain, 407-416. Academic Press, London.

Elfwing S, Uchibe E, Doya K (2009). Co-evolution of rewards and meta-parameters in embodied evolution. In Sendhoff B, Koerner E, Sportns O, Ritter H, Doya K (eds.) Creating Brain-like Intelligence, 278-302. Springer-Verlag, Berlin.

Sendhoff B, Koerner E, Sportns O, Ritter H, Doya K (2009). Creating Brain-like Intelligence. Springer-Verlag, Berlin.

Doya, K. (2007). Invitation to Computational Neuroscience: Towards Understanding the Brain Mechanisms of Learning. Science-Sha (in Japansese).

Doya, K., Ishii, S., Pouget, A., Rao, R. P. N. (2007). Bayesian Brain: Probabilistic Approaches to Neural Coding, MIT Press.

Doya, K., Ishii, S. (2007). A probability primer. In Doya, K., Ishii, S., Pouget, A., Rao, R. P. N. eds. Bayesian Brain: Probabilistic Approaches to Neural Coding, pp. 3-13. MIT Press.

Bissmarck, F., Nakahara, H., Doya,  K., Hikosaka, O. (2005). Responding to modalities with different latencies.  Advances in Neural Information Processing Systems, 17, MIT Press.

Doya, K., Gomi, H., Sakaguchi, Y., Kawato , M. (2005). Computational Mechanisms of the Brain – Bottom-up and Top-down Dynamics. Asakura Shoten (in Japanese).