LLM Reasoning in 2025, Violent Delights Have Violent Ends
LLM Reasoning in 2025.
- Since DeepSeek R1, researchers have been overfitting in the field of mathematical reasoning for large language models
- Clearly, every month brings new “breakthroughs” in the technology
- To boost mathematical reasoning performance by tens of points:
- Even a model one-tenth the size can achieve it
- A few hundred high-quality examples suffice
- Supervised fine-tuning (SFT) can do it
- Distillation can also do it
- A single data point can do it
- No data at all can do it
- Self-generated data can do it
- Randomly assigned rewards, or even deliberately incorrect rewards, can do it
- This teaches us not to test on the training set
- It also inspires us that if we do test on the training set, what tricks can effectively extract the already-ingested samples and produce the correct answers
- Beyond the RLVR veneer, this is also an interesting direction on how large models write in and extract knowledge
- After the first half-year carnival, which advances will stand the test of time, and which will meet a violent end?