適応システムグループの出版物

2017

Wang, J., Uchibe, E., & Doya, K. (2017). Adaptive Baseline Enhances EM-based Policy Search: Validation in a View-based Positioning Task of a Smartphone Balancer. Frontiers in Neurorobotics, 11:1.
Elfwing, S., Uchibe, E., & Doya, K. (2017). Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. arXiv:1702.03118.

2016

Elfwing, S., Uchibe, E., & Doya, K. (2016). From free energy to expected energy: Improving energy-based value function approximation in reinforcement learning. Neural Networks, 84: 17-27.
Wang, J., Uchibe, E., & Doya, K. (2016). EM-based Policy Hyper Parameter Exploration: Application to Standing and Balancing of a Two-wheeled Smartphone Robot. Journal of Artificial Life and Robotics. vol. 21, issue 1, pp. 125-131.
内部英治． (2016). 線形可解マルコフ決定過程を用いた順・逆強化学習（解説論文）．日本神経回路学会誌，vol. 23, no. 1, pp. 2-13.
Uchibe, E. (2016). Deep inverse reinforcement learning by logistic regression. In Proc. of the 23rd International Conference on Neural Information Processing (ICONIP), pp. 23-31.
Huang, Q., Uchibe, E., & Doya, K. (2016). Emergence of communication among reinforcement learning agents under coordination environment. In Proc. of the 6th Joint IEEE International Conference on Developmental Learning and on Epigenetic Robotics.
Reinke, C., Uchibe, E., & Doya, K. (2016). From Neuroscience to Artificial Intelligence: Maximizing Average Reward in Episodic Reinforcement Learning Tasks with an Ensemble of Q-Learners. In the Third CiNet Conference, Neural mechanisms of decision making: Achievements and new directions, Osaka, poster.
Reinke, C., Uchibe, E., & Doya, K. (2016). Learning of Stress Adaptive Habits with an Ensemble of Q-Learners. In The 2nd International Workshop on Cognitive Neuroscience Robotics, Osaka, poster.

2015

Elfwing, S., Uchibe, E., & Doya, K. (2015). Expected energy-based restricted Boltzmann machine for classification. Neural Networks. vol. 64, 29-38.
Reinke, C., Uchibe, E., & Doya, K. (2015). Maximizing the average reward in episodic reinforcement learning tasks. In Proc. of IEEE International Conference on Intelligent Informatics and BioMedical Sciences, Okinawa, pp, 420-421, 2015.
Wang, J., Uchibe, E., & Doya, K. (2015). Two-wheeled smartphone robot learns to stand up and balance by EM-based policy hyper parameter exploration. In Proc. of the 20th International Symposium on Artificial Life and Robotics.
Uchibe, E., & Doya, K. (2015). Inverse Reinforcement Learning with Density Ratio Estimation. The 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making, University of Alberta, Canada, poster.
Reinke, C., Uchibe, E., & Doya, K. (2015). Gamma-QCL: Learning multiple goals with a gamma submodular reinforcement learning framework. In Winter Workshop on Mechanism of Brain and Mind. (poster presentation).

2014

Elfwing, S., & Doya, K. (2014). Emergence of polymorphic mating strategies in robot colonies. PLoS ONE, 9(4), e93622.
Uchibe, E., & Doya, K. (2014).Inverse Reinforcement Learning Using Dynamic Policy Programming. In Proc. of the 4th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp. 222-228.
Uchibe, E., & Doya, K. (2014).Combining learned controllers to achieve new goals based on linearly solvable MDPs. In Proc. of IEEE International Conference on Robotics and Automation, pp. 5252-5259.
Kinjo, K., Uchibe, E., & Doya, K. (2014). Robustness of Linearly Solvable Markov Games with inaccurate dynamics model. In Proc. of the 19th International Symposium on Artificial Life and Robotics.
Wang, J. Uchibe, E., & Doya, K. (2014). Control of Two-Wheeled Balancing and Standing-up Behaviors by an Android Phone Robot. Proc. of the 32nd Annual Conference of Robotics Society of Japan, Kyushu Sangyo University.
Eren Sezener, C., Uchibe, E., & Doya, K. (2014). Ters Peki，stirmeli Ogrenme ile Farelerin Odul Fonksiyonunun Elde Edilmesi. In Proc. of Turkiye Otonom Robotlar Konferans? (TORK). [published in Turkey, but see also anEnglish version].
内部英治，銅谷賢治 (2014)．密度比推定を用いた逆強化学習．第32回日本ロボット学会学術講演会予稿集，九州産業大学．

2013

Kinjo, K., Uchibe, E., & Doya, K. (2013). Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task. Frontiers in Neurorobotics, 7(7).
Elfwing, S., Uchibe, E., & Doya, K. (2013). Scaled free-energy based reinforcement learning for robust and efficient learning in high-dimensional state spaces. Frontiers in Neurorobotics, 7(February), 3.
Sakuma, T., Shimizu, T., Miki, Y., Doya, K., & Uchibe, E. (2013). Computation of Driving Pleasure based on Driver's Learning Process Simulation by Reinforcement Learning. In Proc. of Asia Pacific Automotive Engineering Conference.
Yoshida, N., Uchibe, E., & Doya, K. (2013). Reinforcement learning with state-dependent discount factor. In Proc. of the 3rd Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics (pp. 1-6). IEEE.
Wang, J., Uchibe, E., & Doya, K. (2013). Standing-up and Balancing Behaviors of Android Phone Robot. In Proc. of IEICE-NLP2013-122, 49-54.
内部英治，銅谷賢治 (2013)．オープンソースソフトウェアを用いた強化学習アルゴリズムの実現．クラウドネットワークロボティクス研究会，1-6.
Uchibe, E., Ota, S., & Doya, K. (2013). Inverse Reinforcement Learning for Analysis of Human Behaviors. The 1st Multidisciplinary Conference on Reinforcement Learning and Decision Making, Princeton, New Jersey, USA, poster.
Ota, S., Uchibe, E., & Doya, K. (2013). Analysis of human behaviors by inverse reinforcement learning in a pole balancing task. The 3rd International Symposium on Biology of Decision Making, Paris, France, poster.
内部英治，銅谷賢治 (2013)．密度比推定を用いた逆強化学習．第16回情報論的学習理論ワークショップ (IBIS2013), ポスター．

2012

吉田尚人，吉本潤一郎，内部英治，銅谷賢治 (2012)．スマートフォンを用いたロボットプラットホームの開発．第30回日本ロボット学会学術講演会．
金城健，内部英治，吉本潤一郎，銅谷賢治 (2012)．運動―視覚ダイナミクス学習と線形ベルマン方程式によるロボット制御．情報通信学会研究報告，バイオ情報学，2012-BIO-29(4), 1-6．

2011

Uchibe, E., & Doya, K. (2011). Evolution of rewards and learning mechanisms in Cyber Rodents. J. K. Krichmar and H. Wagatsuma (eds.), Neuromorphic and Brain-Based Robotics, chapter 6, 109-128.
Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2011). Darwinian embodied evolution of the learning ability for survival. Adaptive Behavior, 19(2), 101-120.
金城健，内部英治，吉本潤一郎，銅谷賢治 (2011)．線形ベルマン方程式に基づくロボット制御：システム同定と指数価値関数近似．電子情報通信学会技術研究報告，NCニューロコンピューティング 110(461) 107-112．

2010

Morimura, T., Uchibe, E., Yoshimoto, J., Peters, J., & Doya, K. (2010). Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning. Neural Computation, 22(2), 342-76.
Elfwing, S., Otsuka, M., Uchibe, E., & Doya, K. (2010). Free-Energy Based Reinforcement Learning for Vision-Based Navigation with High-Dimensional Sensory Inputs. In Proc. of the 17th International Conference on Neural Information Processing (pp. 215-222).
木村慎治，芳賀真由美，内部英治，吉本潤一郎，銅谷賢治 (2010)．センサフィードバックを用いたCPG制御における環境ダイナミクスと観測の不確定性の影響．電子情報通信学会技術研究報告，NCニューロコンピューティング 109(461) 219-224．

2009

Uchibe, E., & Doya, K. (2009). Constrained Reinforcement Learning from Intrinsic and Extrinsic Rewards. In M. J. Er & Y. Zhou (Eds.), Theory and Novel Applications of Machine Learning. IN-TECH.
Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2009). Co-evolution of Rewards and Meta-parameters in Embodied Evolution. In B. Sendhoff, E. Korner, O. Sporns, H. Ritter, & K. Doya (Eds.), Creating Brain-Like Intelligence (pp. 278-302). Springer.
Morimura, T., Uchibe, E., Yoshimoto, J., & Doya, K. (2009). A Generalized Natural Actor-Critic Algorithm. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in Neural Information Processing Systems 22 (pp. 1312-1320). MIT Press.
Elfwing, S., Uchibe, E., & Doya, K. (2009). Emergence of Different Mating Strategies in Artificial Embodied Evolution. In Proc. of the 16th International Conference on Neural Information Processing (pp. 638-647).
小林幹浩，内部英治，銅谷賢治 (2009)．感覚情報の能動的低次元化による強化学習．電子情報通信学会技術研究報告，NCニューロコンピューティング 109(53) 19-24．

2008

Uchibe, E., & Doya, K. (2008). Finding intrinsic rewards by embodied evolution and constrained reinforcement learning. Neural Networks, 21(10), 1447-55.
Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2008). Co-evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning. Adaptive Behavior, 16(6), 400-412.
Sato, T., Uchibe, E., & Doya, K. (2008). Learning how, what, and whether to communicate: emergence of protocommunication in reinforcement learning agents. Journal of Artificial Life and Robotics, 12, 70-74.
森村哲郎，内部英治，吉本潤一郎, 銅谷賢治 (2008)．自然方策こう配法：平均報酬の自然こう配に基づく方策探索．電子情報通信学会論文誌D．Vol. J91-D, No.6, pp.1515-1527.
Morimura, T., Uchibe, E., Yoshimoto, J., & Doya, K. (2008). A New Natural Policy Gradient by Stationary Distribution Metric. In Proc. of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (pp. 82-97). Springer Berlin / Heidelberg.
Morimura, T., Uchibe, E., & Doya, K. (2008). Natural policy gradient with baseline adjustment function for variance reduction. Artificial Life and Robotics, 2008.
Kamioka, T., Uchibe, E., & Doya, K. (2008). Neuroevolution Based on Reusable and Hierarchical Modular Representation. Proc. of INNS-NNN Symposia.

2007

Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2007). Evolutionary Development of Hierarchical Learning Structures. IEEE Transactions on Evolutionary Computation, 11(2), 249-264.
上岡拓未，内部英治，銅谷賢治．Max-min Actor-Critic による複数報酬課題の強化学習．電子情報通信学会論文誌D． Vol. J90-D, No. 9, pp. 2510-2521, 2007.
佐藤尚，内部英治，銅谷賢治 (2007)．強化学習エージェントによる協調行動とコミュニケーションの創発．情報処理学会論文誌：数理モデル化と応用 (TOM19), vol. 48, No. SIG19, pp. 55-67.
内部英治，銅谷賢治．サイバーローデントプロジェクト（解説論文）(2007)．日本神経回路学会誌．Vol. 14, No. 4.
Uchibe, E., & Doya, K. (2007). Constrained reinforcement learning from intrinsic and extrinsic rewards. In Proc. of International Conference on Developmental Learning. London, UK: IEEE.
Uchibe, E., & Doya, K. (2007). Finding Exploratory Rewards by Embodied Evolution and Constrained Reinforcement Learning in the Cyber Rodents. In Proc. of the 14th International Conference on Neural Information Processing, 167-176. Kitakushu, Japan: Springer Berlin.
大塚誠，内部英治，銅谷賢治 (2007)．近傍成分分析による行動指向的状態表現の獲得．ニューロコンピューティング研究会，玉川大学．

2006

Uchibe, E., & Asada, M. (2006). Incremental Coevolution With Competitive and Cooperative Tasks in a Multirobot Environment. Proceedings of the IEEE, 94(7), 1412-1424.
上岡拓未，内部英治，銅谷賢治 (2006)．複数の価値関数を用いた多目的強化学習．ニューロコンピューティング研究会，玉川大学．
内部英治，銅谷賢治 (2006)．複数の報酬によって与えられる拘束のもとでの強化学習．ニューロコンピューティング研究会，OIST．
Brunskill, E., Uchibe, E., & Doya, K. (2006). Adaptive state space construction with reinforcement learning for robots. poster presentation in Proc. of the International Conference on Robotics and Automation.

2005

Doya, K., & Uchibe, E. (2005). The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction. Adaptive Behavior, 13, 149-160.
Morimura, T., Uchibe, E., & Doya, K. (2005). Utilizing the natural gradient in temporal difference reinforcement learning with eligibility traces. In Proc. of the 2nd International Symposium on Information Geometry and its Application (pp. 256-263).
Uchibe, E., & Doya, K. (2005). Reinforcement Learning with Multiple Modules: A Framework for Developmental Robot Learning. In Proc. of the 4th International Conference on Developmental Learning, 87-92.
Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2005).Biologically inspired embodied evolution of survival. In Proc. of the IEEE Congress on Evolutionary Computation, 2210-2216.

2004

内部英治，銅谷賢治 (2004)．複数報酬のもとでの階層強化学習．日本ロボット学会誌．Vol. 22, No. 1, pp. 120-129.
Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2004). Multi-agent reinforcement learning: using macro actions to learn a mating task. In Proc. of the IEEE/RSJ International Conference on Intelligent Robots and Systems (Vol. 4, pp. 3164-3169).
Uchibe, E., & Doya, K. (2004). Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling. In S. Schaal, A. Ijspeert, A. Billard, S. Vijayakumar, J. Hallam, & J.-A. Meyer (Eds.), Proc. of the Eighth International Conference on Simulation of Adaptive Behavior: From Animals to Animats 8 (pp. 287?296). MIT Press, Cambridge, MA.