FY2023 Annual Report

Machine Learning and Data Science Unit
Associate Professor Makoto Yamada

MLSS2024

Abstract

This year, we have focused on establishing our machine learning and data science unit. To this end, we have dedicated time to accepting interns and organizing workshops to attract future members to our team. Specifically, we welcomed over 10 interns and hosted the Machine Learning Summer School 2024, which saw participation from more than 200 attendees. In addition to these efforts, we conducted several research projects on federated learning, optimal transport, and anomaly detection, resulting in the development of computationally efficient algorithms. Our research findings were published in top-tier machine learning conferences, including NeurIPS, ICLR, and EMNLP. Next year, we aim to further expand our team and produce even more significant research outcomes.

1. Staff

  • Dr. Makoto Yamada, Associate Professor
  • Dr. Mohammad Sabokrou, Staff Scientist
  • Dr. Yao-Hung Hubert Tsai, External researcher
  • Ms. Chikako Sugiyama, Research Unit Administrator
  • Ms. Terezie Sedlinska, Ph.D. student (Rotation)
  • Ms. Clea Laouar, Ph.D. student (Rotation)
  • Mr. Bakhytzhan Akdavletov, Ph. D. student (Rotation)
  • Mr. Made Benny Prasetya Wiranata, Ph. D student (Rotation)
  • Mr. He Pengfei, Ph.D. student, Michigan State University (VRS)
  • Mr. Haoyu Han, Ph.D. student, Michigan State University (VRS)
  • Mr. Yuxuan Wan, Ph.D. student, Michigan State University (VRS)
  • Mr. Yuki Takezawa, Ph.D. student, Kyoto University (VRS)
  • Mr. Weijie Liu, Ph.D. student, Zhejiang University (VRS)
  • Ms. Laura Sudupe Medinilla, KAUST (VRS)
  • Ms. Nijifon Marianne Abemgnigni, Institute of Mathematical Stochastics of the University of Goettingen (VRS)
  • Ms. Kira Duesterwald, University College London (VRS)
  • Mr. Guillaume Houry, Universite Paris-Saclay (RI)
  • Mr. Parsa Hosseini, Sharif University of Technology (RI)
  • Ms. Seiede Solale Mohammadi, Sharif University of Techoology (RI)
  • Ms. Naghmeh Jamali, Islamic Azad University (RI)
  • Mr. Satoki Ishikawa, Tokyo Institute of Technology (RI)
  • Mr. Ayoub Rhim, Ecole des Ponts ParisTech (RI)

2. Collaborations

2.1 Trustworthy AI

  • Type of collaboration: Joint research
  • Researchers:
    • Professor Jiliang Tang, Michigan State University
    • Mr. He Pengfei, Ph.D. students Michigan State University
    • Mr. Haoyu Han, Ph.D. students Michigan State University
    • Mr. Yuxuan Wan, Ph.D. students Michigan State University

2.2 Development of machine learning technology and its practical applications

  • Type of collaboration: Joint research
  • Researchers:
    • Professor Hidetoshi Shimodaira, Kyoto University
    • Professor Hitoshi Kashima, Kyoto University
    • Professor Yasuaki Hiraoka, Kyoto University
    • Professor Shinichi Minato, Kyoto University
    • Associate Professor, Makoto Yamada, OIST

3. Activities and Findings

Selected research results from our unit are summarized below.

3.1 Federated and distributed learning

3.1.1 Momentum Tracking method (TMLR 2023)

SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and deteriorate when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the setting where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient β∈[0,1). Through experiments, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum and can consistently outperform these existing methods when the data distributions are heterogeneous.

3.1.2 Graph construction (NeurIPS 2023)

Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-(k+1) Graph. Unlike the existing topologies, the Base-(k+1) Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-(k+1) Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-(k+1) Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.

3.2 Optimal transport

3.2.1 Large-scale similarity search (EMNLP 2023)

Wasserstein distance is a powerful tool for comparing probability distributions and is widely used for document classification and retrieval tasks in NLP. In particular, it is known as the word mover's distance (WMD) in the NLP community. WMD exhibits excellent performance for various NLP tasks; however, one of its limitations is its computational cost and thus is not useful for large-scale distribution comparisons. In this study, we propose a simple and effective nearest neighbor search based on the Wasserstein distance. Specifically, we employ the L1 embedding method based on the tree-based Wasserstein approximation and subsequently used the nearest neighbor search to efficiently find the -nearest neighbors. Through benchmark experiments, we demonstrate that the proposed approximation has comparable performance to the vanilla Wasserstein distance and can be computed three orders of magnitude faster than the vanilla Wasserstein distance.

3.2.2 A Linear time approximation of Wasserstein distance (EMNLP 2023)

Wasserstein distance, which can be computed by solving the optimal transport problem, is a powerful method for measuring the dissimilarity between documents. In the NLP community, it is referred to as word mover’s distance (WMD). One of the key challenges of Wasserstein distance is its computational cost since it needs cubic time. Although the Sinkhorn algorithm is a powerful tool to speed up to compute the Wasserstein distance, it still requires square time. Recently, a linear time approximation of the Wasserstein distance including the sliced Wasserstein and the tree-Wasserstein distance (TWD) has been proposed. However, a linear time approximation method suffers when the dimensionality of word vectors is high. In this study, we propose a method to combine feature selection and tree approximation of Wasserstein distance to handle high-dimensional problems. More specifically, we use multiple word embeddings and automatically select useful word embeddings in a tree approximation of Wasserstein distance. To this end, we approximate Wasserstein distance for each word vector by tree approximation technique, and select the discriminative (i.e., large Wasserstein distance) word embeddings by solving an entropic regularized maximization problem. Through our experiments on document classification, our proposed method achieved high performance.

3.3 Anomaly detection

3.3.1 Fake It Till You Make It: Near-Distribution Novelty Detection by Score-Based Generative Models (ICLR2023)

We aim for image-based novelty detection. Despite considerable progress, existing models either fail or face a dramatic drop under the so-called "near-distribution" setting, where the differences between normal and anomalous samples are subtle. We first demonstrate existing methods experience up to 20% decrease in performance in the near-distribution setting. Next, we propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our model is then fine-tuned to distinguish such data from the normal samples. We provide a quantitative as well as qualitative evaluation of this strategy, and compare the results with a variety of GAN-based models. Effectiveness of our method for both the near-distribution and standard novelty detection is assessed through extensive experiments on datasets in diverse applications such as medical images, object classification, and quality control. This reveals that our method considerably improves over existing models, and consistently decreases the gap between the near-distribution and standard novelty detection performance. 

3.3.2 Mitigating Bias: Enhancing Image Classification by Improving Model Explanations. (ACML 2023)

Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images, rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model’s attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for developing more robust and accurate deep-learning models.

4. Publications

4.1 Journals

  1. Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, & Makoto Yamada. Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data.Transactions on Machine Learning Research. 2023. Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data [link]   link   bibtex     
  2. Yuki Takezawa, Kenta Niwa, & Makoto Yamada. Communication compression for decentralized learning with operator splitting methods. IEEE Transactions on Signal and Information Processing over Networks. 2023. link   bibtex   
  3. Yanbin Liu, Girish Dwivedi, Farid Boussaid, Frank Sanfilippo, Makoto Yamada, & Mohammed Bennamoun. Inflating 2D Convolution Weights for Efficient Generation of 3D Medical Images. Computer Methods and Programs in Biomedicine,107685. 2023. Inflating 2D Convolution Weights for Efficient Generation of 3D Medical Images [link]   doi   link   bibtex     
  4. Yanbin Liu, Linchao Zhu, Xiaohan Wang, Makoto Yamada, & Yi Yang. Bilaterally-normalized Scale-consistent Sinkhorn Distance for Few-shot Image Classification. IEEE Transactions on Neural Networks and Learning Systems. 2023. doi   link   bibtex     
  5. Héctor Climente-González, Chloé-Agathe Azencott, & Makoto Yamada. A network-guided protocol to discover susceptibility genes in genome-wide association studies using stability selection. STAR protocols, 4(1): 101998. 2023. link   bibtex  

4.2 Conference proceedings (with review)

  1. Sho Otao, & Makoto Yamada. A linear time approximation of Wasserstein distance with word embedding selection. In EMNLP, 2023. link   bibtex  
  2. Cléa Laouar, Yuki Takezawa, & Makoto Yamada. Large-scale similarity search with Optimal Transport. In EMNLP, 2023. link   bibtex   
  3. Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, & Makoto Yamada. Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence. In NeurIPS, 2023. link   bibtex   
  4. Raha Ahmadi, Mohammad Javad Rajabi, Mohammad Khalooiem, & Mohamamd Sabokrou. Mitigating Bias: Enhancing Image Classification by Improving Model Explanations. In ACML, 2023. link   bibtex   
  5. Marco Fiorucci, Peter Naylor, & Makoto Yamada. Optimal Transport for Change Detection on LiDAR Point Clouds. IGARSS. 2023. link   bibtex   
  6. Dinesh Singh, Héctor Climente-González, Mathis Petrovich, Eiryo Kawakami, & Makoto Yamada. Fsnet: Feature selection network on high-dimensional biological data. IJCNN. 2023. link   bibtex   
  7. Ryuichiro Hataya, & Makoto Yamada. Nyström Method for Accurate and Scalable Implicit Differentiation. In AISTATS, 2023. link   bibtex   
  8. Weijie Liu, Jiahao Xie, Chao Zhang, Makoto Yamada, Nenggan Zheng, & Hui Qian. Robust Graph Dictionary Learning. In ICLR, 2023. link   bibtex   
  9. Hossein Mirzaei, Mohammadreza Salehi, Sajjad Shahabi, Efstratios Gavves, Cees G. M. Snoek, Mohammad Sabokrou, & Mohammad Hossein Rohban. Fake It Till You Make It: Near-Distribution Novelty Detection by Score-Based Generative Models. In ICLR, 2023. link   bibtex   

4.3 Books and other one-time publications

Nothing to report

4.4 Oral and Poster Presentations

  1. Sho Otao, & Makoto Yamada. A linear time approximation of Wasserstein distance with word embedding selection.  The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), December 6 –10, Resorts World Convention Centre, Singapore, Poster Presentation.
  2. Cléa Laouar, Yuki Takezawa, & Makoto Yamada. Large-scale similarity search with Optimal Transport. The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), December 6 –10, Resorts World Convention Centre, Singapore, Poster Presentation.
  3. Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, & Makoto Yamada. Beyond Exponential Graph: Communication-Efficient Topologies for Decentralized Learning via Finite-time Convergence.   The Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS), December 10-16, Ernest N. Morial Convention Center, New Orleans, USA, Poster Presentation.
  4. Raha Ahmadi, Mohammad Javad Rajabi, Mohammad Khalooiem, & Mohamamd Sabokrou. Mitigating Bias: Enhancing Image Classification by Improving Model Explanations. The 15th Asian Conference on Machine Learning (ACML), November 11-14, Istanbul, Turkey, Poster Presentaion.  
  5. Hossein Mirzaei, Mohammadreza Salehi, Sajjad Shahabi, Efstratios Gavves, Cees G. M. Snoek, Mohammad Sabokrou, & Mohammad Hossein Rohban. Fake It Till You Make It: Near-Distribution Novelty Detection by Score-Based Generative Models. The Eleventh International Conference on Learning Representations (ICLR), May 1-5, Kigali Rwanda, Poster Presentation.

5. Intellectual Property Rights and Other Specific Achievements

Nothing to report

6. Meetings and Events

6. 1 Seminar

1. [Online Machine Learning Insights and Innovations (MLII) Seminar Series] Unlocking the Potential of Federated Learning in Medical Imaging
•    Date: February 9, 2024
•    Venue: Zoom 
•    Speaker: Prof. Shadi Albarqouni, University of Bonn


2. [Online Machine Learning Insights and Innovations (MLII) Seminar Series] Self-Supervised Learning from Images and Videos using Optimal Transport
•    Date: January 24, 2024
•    Venue: Zoom 
•    Speaker:  Dr. Yuki M. Asano, Assistant Professor, QUVA Lab, University of Amsterdam


3. MLDS Seminar 2023-9
•    Date: December 14, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1: Ms. Kira Duesterwald, University College London, UK
•    Speaker 2: Ms. Clea Mehnia Laouar, PhD Student, OIST


4. [Online Machine Learning Insights and Innovations (MLII) Seminar Series] How to Detect Out-of-Distribution Data in the Wild? Challenges, Research Progress, and Path Forward
•    Date: December 1, 2023
•    Venue: Zoom 
•    Speaker: Dr. Sharon Yixuan Li, Assistant Professor, University of Wisconsin-Madison


5. [Seminar]Domain-specific Large Language Models: Case Studies in Educational and Medical Fields
•    Date: November 13, 2023
•    Venue: Seminar Room L5D23  
•    Speaker: Dr. Irene Li, Assistant Professor, University of Tokyo


6. MLDS Seminar2023-8
•    Date: September 28, 2023
•    Venue: Seminar Room L5D23  
•    Speaker 1: Ms. ABEMGNIGNI NJIFON Marianne, Institute of Mathematical Stochastics of the University of Goettingen
•    Speaker 2: Ms. Naghmeh Jamali, Islamic Azad University, Iran


7. MLDS Seminar2023-7
•    Date: September 21, 2023
•    Venue: Seminar Room L5D23  
•    Speaker 1: Dr. Deborah Sulem, Postdoctoral Researcher, Barcelona School of Economics- Universitat Pompeu Febra, Spain
•    Speaker 2: Ms. Solaleh Mohammadi, Sharif University of Technology, Iran


8. MLDS Seminar2023-6
•    Date: September 14, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1: Mr. Parsa Hosseini, Sharif University of Technology, Iran
•    Speaker 2: Ms. Laura Sudupe Medinilla, King Abdullah University of Science and Technology, KAUST


9. [Seminar] Retrieval-based Language Models and Applications / Neural theorem proving
•    Date: August 30, 2023
•    Venue: Seminar Room C210
•    Speaker 1: Ms. Akari Asai, University of Washington
•    Speaker 2: Dr. Sean Welleck, Assisant Professor, Cargegie Mellon University


10. MLDS Seminar2023-5
•    Date: July 13, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1: Dr. Makoto Yamada, Associate Professor, OIST
•    Speaker 2: Ms. Terezie Sedlinska, PhD Student, OIST


11. MLDS Seminar2023-4
•    Date: July 6, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1:  Mr. Tobias Freidling, Ph. D. Student, University of Cambridge
•    Speaker 2: Dr. Mohammad Sabokrou, Staff Scientist, OIST


12. MLDS Seminar2023-3
•    Date: June 22, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1: Mr. Guillaume Houry, Universite Paris-Saclay, France
•    Speaker 2: Mr. Yuxuan Wan, Michigan State University


13. MLDS Seminar2023-2
•    Date: June 15, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1: Mr. Haoyu Han, Michigan State University
•    Speaker 2: Mr. Weijie Liu, Zhejiang University


14. [Seminar] Metric Recovery from Unweighted k-NN Graphs
•    Date: June 12, 2023
•    Venue: Seminar Room L5D23 
•    Speaker: Mr. Ryoma Sato, Kyoto University


15. MLDS Seminar2023-1
•    Date: June 8, 2023
•    Venue: Seminar Room L5D23 
•    Speaker 1: Mr. Pengfei He, Michigan State University
•    Speaker 2: Mr. Yuki Takezawa, Kyoto University


16. [Seminar] Proper Losses, Moduli of Convexity, and Surrogate Regret Bounds
•    Date: June 7, 2023
•    Venue: Lounge Space (screen), DE18, Lab5 
•    Speaker: Dr. Han Bao, Assistant Professor, Kyoto University


17. [Online Seminar] Expected Expressivity and Gradients of Maxout Networks 
•    Date: June 5, 2023
•    Venue: Zoom
•    Speaker: Ms. Hanna Tseran, Max Planck Institute for Mathematics in the Sciences 

6. 2 Workshop etc. 

The Machine Learning Summer School in Okinawa 2024(MLSS2024)

  • Date: March 4 - 15, 2024
  • Venue: OIST Campus, Auditorium, Conference center
  • Speakers:
  1. Dr. Tatsunori Hashimoto, Assistant Professor, Stanford university
  2. Dr. Francesco Orabona, Associate Professor, King Abdullah University of Science and Technology
  3. Prof. Pierre Alquier, ESSEC Business School-ASIA Pacific- Professror of Statistics, Singapore
  4. Dr. Kun Yuan, Assistant Professor, Peking University    
  5. Dr. Han Zhao, Assistant Professor, University of Illinois at Urbana-Champaign    
  6. Dr. Diyi Yang, Assistant Professor, Stanford university
  7. Prof. Kenji Fukumizu, The Institute of Statistical Mathmatics  
  8. Dr. Amy Zhang, Assistant Professor, The University of Texas at Austin    
  9. Prof. Taiji Suzuki, The University of Tokyo    
  10. Dr. Marco Cuturi, Research Scientist, Apple ML Research, Paris
  11. Prof. Shai Ben-David, Universitys of Waterloo, Canada
  12. Prof. Arthur Gretton, University College London    
  13. Prof. Yu-Chiang Frank Wang,  National Taiwan University    
  14. Dr. Masaaki Imaizumi, Associate Professor,  The University of Tokyo    
  15. Dr. Shinji Ito,  The University of Tokyo, NEC    
  16. Mr. Ryoma Sato, Kyoto University
  17. Dr. Yao-Hung Tsai, OIST
  18. Prof. Kenji Doya, OIST

Joint Workshop IBISML, NC, BIO, MPS

  • Date: June 29,30, July 1, 2023
  • Venue: OIST Campus Conference center

IPSJ SIG Mobile computing and smart society system (SIG-MBL)

  • Date: May 18-19, 2023
  • Venue: OIST Campus B250

7. Other

Nothing to report.