LLM Reasoning in 2025, Violent Delights Have Violent Ends

LLM Reasoning in 2025.

  • Since DeepSeek R1, researchers have been overfitting in the field of mathematical reasoning for large language models
  • Clearly, every month brings new “breakthroughs” in the technology
  • To boost mathematical reasoning performance by tens of points:
    • Even a model one-tenth the size can achieve it
    • A few hundred high-quality examples suffice
    • Supervised fine-tuning (SFT) can do it
    • Distillation can also do it
    • A single data point can do it
    • No data at all can do it
    • Self-generated data can do it
    • Randomly assigned rewards, or even deliberately incorrect rewards, can do it
  • This teaches us not to test on the training set
  • It also inspires us that if we do test on the training set, what tricks can effectively extract the already-ingested samples and produce the correct answers
  • Beyond the RLVR veneer, this is also an interesting direction on how large models write in and extract knowledge
  • After the first half-year carnival, which advances will stand the test of time, and which will meet a violent end?