[Seminar] "Revisiting Peng's Q(λ): Good Old-Fashioned Algorithm for Modern Reinforcement Learning” by Dr. Tadashi Kozuno


2021年5月19日 (水) 13:00


Meeting room C016, Center Bldg.


Dear all,

Neural Computation Unit (Doya Unit) would like to invite you to a seminar as follows.

Date: Wednesday, May 19, 2021
Time: 13:00 – 14:00
Venue: Meeting room C016, Center Bldg.

Speaker: Dr. Tadashi Kozuno *The talk will be povided on ZOOM.
                RLAI Lab, University of Alberta

Revisiting Peng's Q(λ): Good Old-Fashioned Algorithm for Modern Reinforcement Learning

Abstract: Off-policy multi-step reinforcement learning algorithms consist of conservative and non-conservative algorithms: the former actively cut traces, whereas the latter do not. Recently, Munos et al. (2016) proved the convergence of conservative algorithms to an optimal Q-function. In contrast, non-conservative algorithms are thought to be unsafe and have a limited or no theoretical guarantee. Nonetheless, recent studies have shown that non-conservative algorithms empirically outperform conservative ones. Motivated by the empirical results and the lack of theory, we carry out theoretical analyses of Peng's Q( λ ), a representative example of non-conservative algorithms. We prove that it also converges to an optimal policy provided that the behavior policy slowly tracks a greedy policy in a way similar to conservative policy iteration. Such a result has been conjectured to be true but has not been proven. We also experiment with Peng's Q( λ ) in complex continuous control tasks, confirming that Peng's Q( λ ) often outperforms conservative algorithms despite its simplicity. These results indicate that Peng's Q( λ ), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

We hope to see many of you at the seminar.

Neural Computation Unit
Contact: ncus@oist.jp

All-OIST Category: 

Intra-Group Category

Subscribe to the OIST Calendar: Right-click to download, then open in your calendar application.