LLM Reasoning in 2025

Posted on 2025-05-30 Edited on 2025-07-16 In MyQuestion Views: Word count in article: 1.6k Reading time ≈ 1 mins.

What LLM Reasoning be like in the first half of 2025.

Since DeepSeek R1, researchers have been overfitting in the field of mathematical reasoning for large language models
Clearly, every month brings new “breakthroughs” in the technology
To boost mathematical reasoning performance by tens of points:
- Even a model one-tenth the size can achieve it
- A few hundred high-quality examples suffice
- Supervised fine-tuning (SFT) can do it
- Distillation can also do it
- A single data point can do it
- No data at all can do it
- Self-generated data can do it
- Randomly assigned rewards, or even deliberately incorrect rewards, can do it
This teaches us not to test on the training set
It also inspires us that if we do test on the training set, what tricks can effectively extract the already-ingested samples and produce the correct answers
Beyond the RLVR veneer, this is also an interesting direction on how large models write in and extract knowledge
After the first half-year carnival, which advances will stand the test of time, and which will meet a violent end?