FY2023 Annual Report
Machine Learning and Data Science Unit
Associate Professor Makoto Yamada
Abstract
This year, we have focused on establishing our machine learning and data science unit. To this end, we have dedicated time to accepting interns and organizing workshops to attract future members to our team. Specifically, we welcomed over 10 interns and hosted the Machine Learning Summer School 2024, which saw participation from more than 200 attendees. In addition to these efforts, we conducted several research projects on federated learning, optimal transport, and anomaly detection, resulting in the development of computationally efficient algorithms. Our research findings were published in top-tier machine learning conferences, including NeurIPS, ICLR, and EMNLP. Next year, we aim to further expand our team and produce even more significant research outcomes.
1. Staff
- Dr. Makoto Yamada, Associate Professor
- Dr. Mohammad Sabokrou, Staff Scientist
- Dr. Yao-Hung Hubert Tsai, External researcher
- Ms. Chikako Sugiyama, Research Unit Administrator
- Ms. Terezie Sedlinska, Ph.D. student (Rotation)
- Ms. Clea Laouar, Ph.D. student (Rotation)
- Mr. Bakhytzhan Akdavletov, Ph. D. student (Rotation)
- Mr. Made Benny Prasetya Wiranata, Ph. D student (Rotation)
- Mr. He Pengfei, Ph.D. student, Michigan State University (VRS)
- Mr. Haoyu Han, Ph.D. student, Michigan State University (VRS)
- Mr. Yuxuan Wan, Ph.D. student, Michigan State University (VRS)
- Mr. Yuki Takezawa, Ph.D. student, Kyoto University (VRS)
- Mr. Weijie Liu, Ph.D. student, Zhejiang University (VRS)
- Ms. Laura Sudupe Medinilla, KAUST (VRS)
- Ms. Nijifon Marianne Abemgnigni, Institute of Mathematical Stochastics of the University of Goettingen (VRS)
- Ms. Kira Duesterwald, University College London (VRS)
- Mr. Guillaume Houry, Universite Paris-Saclay (RI)
- Mr. Parsa Hosseini, Sharif University of Technology (RI)
- Ms. Seiede Solale Mohammadi, Sharif University of Techoology (RI)
- Ms. Naghmeh Jamali, Islamic Azad University (RI)
- Mr. Satoki Ishikawa, Tokyo Institute of Technology (RI)
- Mr. Ayoub Rhim, Ecole des Ponts ParisTech (RI)
2. Collaborations
2.1 Trustworthy AI
- Type of collaboration: Joint research
- Researchers:
- Professor Jiliang Tang, Michigan State University
- Mr. He Pengfei, Ph.D. students Michigan State University
- Mr. Haoyu Han, Ph.D. students Michigan State University
- Mr. Yuxuan Wan, Ph.D. students Michigan State University
2.2 Development of machine learning technology and its practical applications
- Type of collaboration: Joint research
- Researchers:
- Professor Hidetoshi Shimodaira, Kyoto University
- Professor Hitoshi Kashima, Kyoto University
- Professor Yasuaki Hiraoka, Kyoto University
- Professor Shinichi Minato, Kyoto University
- Associate Professor, Makoto Yamada, OIST
3. Activities and Findings
Selected research results from our unit are summarized below.
3.1 Federated and distributed learning
3.1.1 Momentum Tracking method (TMLR 2023)
SGD with momentum is one of the key components for improving the performance of neural networks. For decentralized learning, a straightforward approach using momentum is Distributed SGD (DSGD) with momentum (DSGDm). However, DSGDm performs worse than DSGD when the data distributions are statistically heterogeneous. Recently, several studies have addressed this issue and proposed methods with momentum that are more robust to data heterogeneity than DSGDm, although their convergence rates remain dependent on data heterogeneity and deteriorate when the data distributions are heterogeneous. In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity. More specifically, we analyze the convergence rate of Momentum Tracking in the setting where the objective function is non-convex and the stochastic gradient is used. Then, we identify that it is independent of data heterogeneity for any momentum coefficient β∈[0,1). Through experiments, we demonstrate that Momentum Tracking is more robust to data heterogeneity than the existing decentralized learning methods with momentum and can consistently outperform these existing methods when the data distributions are heterogeneous.
3.1.2 Graph construction (NeurIPS 2023)
Decentralized learning has recently been attracting increasing attention for its applications in parallel computation and privacy preservation. Many recent studies stated that the underlying network topology with a faster consensus rate (a.k.a. spectral gap) leads to a better convergence rate and accuracy for decentralized learning. However, a topology with a fast consensus rate, e.g., the exponential graph, generally has a large maximum degree, which incurs significant communication costs. Thus, seeking topologies with both a fast consensus rate and small maximum degree is important. In this study, we propose a novel topology combining both a fast consensus rate and small maximum degree called the Base-(k+1) Graph. Unlike the existing topologies, the Base-(k+1) Graph enables all nodes to reach the exact consensus after a finite number of iterations for any number of nodes and maximum degree k. Thanks to this favorable property, the Base-(k+1) Graph endows Decentralized SGD (DSGD) with both a faster convergence rate and more communication efficiency than the exponential graph. We conducted experiments with various topologies, demonstrating that the Base-(k+1) Graph enables various decentralized learning methods to achieve higher accuracy with better communication efficiency than the existing topologies.
3.2 Optimal transport
3.2.1 Large-scale similarity search (EMNLP 2023)
Wasserstein distance is a powerful tool for comparing probability distributions and is widely used for document classification and retrieval tasks in NLP. In particular, it is known as the word mover's distance (WMD) in the NLP community. WMD exhibits excellent performance for various NLP tasks; however, one of its limitations is its computational cost and thus is not useful for large-scale distribution comparisons. In this study, we propose a simple and effective nearest neighbor search based on the Wasserstein distance. Specifically, we employ the L1 embedding method based on the tree-based Wasserstein approximation and subsequently used the nearest neighbor search to efficiently find the -nearest neighbors. Through benchmark experiments, we demonstrate that the proposed approximation has comparable performance to the vanilla Wasserstein distance and can be computed three orders of magnitude faster than the vanilla Wasserstein distance.
3.2.2 A Linear time approximation of Wasserstein distance (EMNLP 2023)
Wasserstein distance, which can be computed by solving the optimal transport problem, is a powerful method for measuring the dissimilarity between documents. In the NLP community, it is referred to as word mover’s distance (WMD). One of the key challenges of Wasserstein distance is its computational cost since it needs cubic time. Although the Sinkhorn algorithm is a powerful tool to speed up to compute the Wasserstein distance, it still requires square time. Recently, a linear time approximation of the Wasserstein distance including the sliced Wasserstein and the tree-Wasserstein distance (TWD) has been proposed. However, a linear time approximation method suffers when the dimensionality of word vectors is high. In this study, we propose a method to combine feature selection and tree approximation of Wasserstein distance to handle high-dimensional problems. More specifically, we use multiple word embeddings and automatically select useful word embeddings in a tree approximation of Wasserstein distance. To this end, we approximate Wasserstein distance for each word vector by tree approximation technique, and select the discriminative (i.e., large Wasserstein distance) word embeddings by solving an entropic regularized maximization problem. Through our experiments on document classification, our proposed method achieved high performance.
3.3 Anomaly detection
3.3.1 Fake It Till You Make It: Near-Distribution Novelty Detection by Score-Based Generative Models (ICLR2023)
We aim for image-based novelty detection. Despite considerable progress, existing models either fail or face a dramatic drop under the so-called "near-distribution" setting, where the differences between normal and anomalous samples are subtle. We first demonstrate existing methods experience up to 20% decrease in performance in the near-distribution setting. Next, we propose to exploit a score-based generative model to produce synthetic near-distribution anomalous data. Our model is then fine-tuned to distinguish such data from the normal samples. We provide a quantitative as well as qualitative evaluation of this strategy, and compare the results with a variety of GAN-based models. Effectiveness of our method for both the near-distribution and standard novelty detection is assessed through extensive experiments on datasets in diverse applications such as medical images, object classification, and quality control. This reveals that our method considerably improves over existing models, and consistently decreases the gap between the near-distribution and standard novelty detection performance.
3.3.2 Mitigating Bias: Enhancing Image Classification by Improving Model Explanations. (ACML 2023)
Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images, rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model’s attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for developing more robust and accurate deep-learning models.
4. Publications
4.1 Journals
- link bibtex Transactions on Machine Learning Research. 2023.
- link bibtex IEEE Transactions on Signal and Information Processing over Networks. 2023.
- doi link bibtex Computer Methods and Programs in Biomedicine,107685. 2023.
- doi link bibtex IEEE Transactions on Neural Networks and Learning Systems. 2023.
- link bibtex STAR protocols, 4(1): 101998. 2023.
4.2 Conference proceedings (with review)
- link bibtex In EMNLP, 2023.
- link bibtex In EMNLP, 2023.
- link bibtex In NeurIPS, 2023.
- link bibtex In ACML, 2023.
- link bibtex IGARSS. 2023.
- link bibtex IJCNN. 2023.
- link bibtex In AISTATS, 2023.
- link bibtex In ICLR, 2023.
- link bibtex In ICLR, 2023.
4.3 Books and other one-time publications
Nothing to report
4.4 Oral and Poster Presentations
- The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), December 6 –10, Resorts World Convention Centre, Singapore, Poster Presentation.
- The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), December 6 –10, Resorts World Convention Centre, Singapore, Poster Presentation.
- The Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS), December 10-16, Ernest N. Morial Convention Center, New Orleans, USA, Poster Presentation.
- The 15th Asian Conference on Machine Learning (ACML), November 11-14, Istanbul, Turkey, Poster Presentaion.
- The Eleventh International Conference on Learning Representations (ICLR), May 1-5, Kigali Rwanda, Poster Presentation.
5. Intellectual Property Rights and Other Specific Achievements
Nothing to report
6. Meetings and Events
6. 1 Seminar
1. [Online Machine Learning Insights and Innovations (MLII) Seminar Series] Unlocking the Potential of Federated Learning in Medical Imaging
• Date: February 9, 2024
• Venue: Zoom
• Speaker: Prof. Shadi Albarqouni, University of Bonn
2. [Online Machine Learning Insights and Innovations (MLII) Seminar Series] Self-Supervised Learning from Images and Videos using Optimal Transport
• Date: January 24, 2024
• Venue: Zoom
• Speaker: Dr. Yuki M. Asano, Assistant Professor, QUVA Lab, University of Amsterdam
3. MLDS Seminar 2023-9
• Date: December 14, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Ms. Kira Duesterwald, University College London, UK
• Speaker 2: Ms. Clea Mehnia Laouar, PhD Student, OIST
4. [Online Machine Learning Insights and Innovations (MLII) Seminar Series] How to Detect Out-of-Distribution Data in the Wild? Challenges, Research Progress, and Path Forward
• Date: December 1, 2023
• Venue: Zoom
• Speaker: Dr. Sharon Yixuan Li, Assistant Professor, University of Wisconsin-Madison
5. [Seminar]Domain-specific Large Language Models: Case Studies in Educational and Medical Fields
• Date: November 13, 2023
• Venue: Seminar Room L5D23
• Speaker: Dr. Irene Li, Assistant Professor, University of Tokyo
6. MLDS Seminar2023-8
• Date: September 28, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Ms. ABEMGNIGNI NJIFON Marianne, Institute of Mathematical Stochastics of the University of Goettingen
• Speaker 2: Ms. Naghmeh Jamali, Islamic Azad University, Iran
7. MLDS Seminar2023-7
• Date: September 21, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Dr. Deborah Sulem, Postdoctoral Researcher, Barcelona School of Economics- Universitat Pompeu Febra, Spain
• Speaker 2: Ms. Solaleh Mohammadi, Sharif University of Technology, Iran
8. MLDS Seminar2023-6
• Date: September 14, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Mr. Parsa Hosseini, Sharif University of Technology, Iran
• Speaker 2: Ms. Laura Sudupe Medinilla, King Abdullah University of Science and Technology, KAUST
9. [Seminar] Retrieval-based Language Models and Applications / Neural theorem proving
• Date: August 30, 2023
• Venue: Seminar Room C210
• Speaker 1: Ms. Akari Asai, University of Washington
• Speaker 2: Dr. Sean Welleck, Assisant Professor, Cargegie Mellon University
10. MLDS Seminar2023-5
• Date: July 13, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Dr. Makoto Yamada, Associate Professor, OIST
• Speaker 2: Ms. Terezie Sedlinska, PhD Student, OIST
11. MLDS Seminar2023-4
• Date: July 6, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Mr. Tobias Freidling, Ph. D. Student, University of Cambridge
• Speaker 2: Dr. Mohammad Sabokrou, Staff Scientist, OIST
12. MLDS Seminar2023-3
• Date: June 22, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Mr. Guillaume Houry, Universite Paris-Saclay, France
• Speaker 2: Mr. Yuxuan Wan, Michigan State University
13. MLDS Seminar2023-2
• Date: June 15, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Mr. Haoyu Han, Michigan State University
• Speaker 2: Mr. Weijie Liu, Zhejiang University
14. [Seminar] Metric Recovery from Unweighted k-NN Graphs
• Date: June 12, 2023
• Venue: Seminar Room L5D23
• Speaker: Mr. Ryoma Sato, Kyoto University
15. MLDS Seminar2023-1
• Date: June 8, 2023
• Venue: Seminar Room L5D23
• Speaker 1: Mr. Pengfei He, Michigan State University
• Speaker 2: Mr. Yuki Takezawa, Kyoto University
16. [Seminar] Proper Losses, Moduli of Convexity, and Surrogate Regret Bounds
• Date: June 7, 2023
• Venue: Lounge Space (screen), DE18, Lab5
• Speaker: Dr. Han Bao, Assistant Professor, Kyoto University
17. [Online Seminar] Expected Expressivity and Gradients of Maxout Networks
• Date: June 5, 2023
• Venue: Zoom
• Speaker: Ms. Hanna Tseran, Max Planck Institute for Mathematics in the Sciences
6. 2 Workshop etc.
The Machine Learning Summer School in Okinawa 2024(MLSS2024)
- Date: March 4 - 15, 2024
- Venue: OIST Campus, Auditorium, Conference center
- Speakers:
- Dr. Tatsunori Hashimoto, Assistant Professor, Stanford university
- Dr. Francesco Orabona, Associate Professor, King Abdullah University of Science and Technology
- Prof. Pierre Alquier, ESSEC Business School-ASIA Pacific- Professror of Statistics, Singapore
- Dr. Kun Yuan, Assistant Professor, Peking University
- Dr. Han Zhao, Assistant Professor, University of Illinois at Urbana-Champaign
- Dr. Diyi Yang, Assistant Professor, Stanford university
- Prof. Kenji Fukumizu, The Institute of Statistical Mathmatics
- Dr. Amy Zhang, Assistant Professor, The University of Texas at Austin
- Prof. Taiji Suzuki, The University of Tokyo
- Dr. Marco Cuturi, Research Scientist, Apple ML Research, Paris
- Prof. Shai Ben-David, Universitys of Waterloo, Canada
- Prof. Arthur Gretton, University College London
- Prof. Yu-Chiang Frank Wang, National Taiwan University
- Dr. Masaaki Imaizumi, Associate Professor, The University of Tokyo
- Dr. Shinji Ito, The University of Tokyo, NEC
- Mr. Ryoma Sato, Kyoto University
- Dr. Yao-Hung Tsai, OIST
- Prof. Kenji Doya, OIST
Joint Workshop IBISML, NC, BIO, MPS
- Date: June 29,30, July 1, 2023
- Venue: OIST Campus Conference center
IPSJ SIG Mobile computing and smart society system (SIG-MBL)
- Date: May 18-19, 2023
- Venue: OIST Campus B250
7. Other
Nothing to report.