FY2022 Annual Report

Machine Learning and Data Science Unit
Associate Professor Makoto Yamada

Abstract

In FY2022, our primary focus was on establishing and preparing the unit for FY2023. In line with this objective, we onboarded Dr. Mohammad Sabokrou as a staff scientist and engaged several visiting researchers to contribute to the unit. Additionally, we successfully recruited visiting scholars and interns for FY2023, and carefully selected approximately 10 highly talented M.S. and Ph.D. students. Regarding our research endeavors, we have dedicated our efforts to addressing optimal transport problems and developing distributed learning frameworks. As a result, we have achieved significant milestones, including the publication of a research paper in Transactions on Machine Learning Research. Furthermore, we have produced a couple of technical reports to further disseminate our findings and insights.

1. Staff

Dr. Makoto Yamada, Associate Professor
Dr. Mohammad Sabokrou, Staff Scientist
Ms. Emiko Asato, Research Unit Administrator
Ms. Chikako Sugiyama, Research Unit Administrator

2. Collaborations

2.1 Optimal Transport

Description: Research and development of new optimal transport algorithms.
Type of collaboration: Joint research
Researchers:
- Makoto Yamada (OIST)
- Yuki Takezawa (Kyoto University)
- Han Bao (Kyoto University)
- Ryoma Sato (Kyoto University)
- Zornitsa Kozareva (Meta AI)
- Sujith Ravi (SliceX AI)

2.2 Distributed Learning

Description: Research and development of new distributed learning algorithms.
Type of collaboration: Joint research
Researchers:
- Yuki Takezawa (Kyoto University)
- Han Bao (Kyoto University)
- Ryoma Sato (Kyoto University)
- Kenta Niwa (NTT CS laboratories)
- Makoto Yamada (OIST)

2.3 Machine Learning Applications

Description: Proposing new change detection algorithms from satellite images.
Type of collaboration: Joint research
Researchers:
- Marco Fiorucci (IIT)
- Peter Naylor (RIKEN AIP)
- Makoto Yamada (OIST)

3. Activities and Findings

3.1 Approximating 1-Wasserstein Distance with Trees

Abstract: The Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing and computer vision applications. One of the challenges in estimating the Wasserstein distance is that it is computationally expensive and does not scale well for many distribution-comparison tasks. In this study, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where the TWD is a 1-Wasserstein distance with tree-based embedding that can be computed in linear time with respect to the number of nodes on a tree. More specifically, we propose a simple yet efficient L1-regularized approach for learning the weights of edges in a tree. To this end, we first demonstrate that the 1-Wasserstein approximation problem can be formulated as a distance approximation problem using the shortest path distance on a tree. We then show that the shortest path distance can be represented by a linear model and formulated as a Lasso-based regression problem. Owing to the convex formulation, we can efficiently obtain a globally optimal solution. We also propose a tree-sliced variant of these methods. Through experiments, we demonstrate that the TWD can accurately approximate the original 1-Wasserstein distance by using the weight estimation technique. Our code can be found in the GitHub repository.

3.2 Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

Abstract: SGD with momentum acceleration is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum acceleration is Distributed SGD (DSGD) with momentum acceleration (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum acceleration that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and decrease when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum acceleration whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the standard deep learning setting, where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient. Through image classification tasks, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum acceleration and can consistently outperform these existing methods when the data distributions are heterogeneous.β∈[0,1

4. Publications and technical report

Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, & Sujith Ravi. Approximating 1-Wasserstein Distance with Trees. Transaction on Machine Learning Research (TMLR). 2022.
Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, Makoto Yamada: Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data. arXiv.