[Seminar] MLDS Unit Seminar 2025-1 by Mr. Niklas Muennighoff, Stanford University

Date

Tuesday, May 13, 2025 - 10:00 to 11:00

Location

Online (Zoom only)

Description

Zoom Link: here

Speaker: Mr. Niklas Muennighoff (Stanford University)

Title: s1: Simple test-time scaling

Abstract: In this talk, we will discuss the s1 paper. Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 questions paired with reasoning traces relying on three criteria we validate through ablations: difficulty, diversity, and quality. Second, we develop budget forcing to control test-time compute by forcefully terminating the model's thinking process or lengthening it by appending "Wait" multiple times to the model's generation when it tries to end. This can lead the model to double-check its answer, often fixing incorrect reasoning steps. After supervised finetuning the Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, our model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24). Further, scaling s1-32B with budget forcing allows extrapolating beyond its performance without test-time intervention: from 50% to 57% on AIME24.

Bio: Niklas Muennighoff is a PhD student at Stanford. His research focuses on improving large language models via works like s1, OLMoE, and MTEB. He has received a best paper award at ACL, a best paper runner-up at NeurIPS and a JLPTN1. He did his Bachelor's at Peking University.

All-OIST Category:

Research

Subscribe to the OIST Calendar: Right-click to download, then open in your calendar application.