(Welcome) to the Era of Wild
Connecting the dots, (welcome) to the era of wild.
Three quarters of 2025 have already passed, and many exciting new developments have occurred in AI this year. Looking back, I feel many dots have once again been strung together:
- DeepSeek R1
- ClaudeCode/Gemini CLI/Codex
- Era of Experience
- The Second Half
- RLVR
- Agentic RL
- HealthBench
- GDPval
- Sora 2 App
- OpenAI Dev Day
A line that connects them is that AI will no longer be optimized merely toward datasets, toward metrics such as accuracy, but will be directly optimized toward the deep and complex goals in real-world human social activities. We put AI in the Wild, hoping it can directly participate in human social production activities, learn from feedback from the real world, and that such feedback can reach directly to the model, until every next token prediction, rather than being overly diluted by layers of external mechanisms.
Evolution of Tasks
From the perspective of NLP, we have been making task move from simulation toward the real:
- In the NLP era, the tasks the model completed were to classify a sentiment label for each text fragment, or to annotate dependency relations in a sentence, whereas humans (except linguists/NLPer) generally do not and do not need to do this
- In the LLM era, models organize all NLP tasks into the form of prompts for processing, and humans do the same when communicating with language
- In the Agentic era, models operate tools and produce artifact, and many human professions are precisely to do these things
- In the Wild era, models and humans will jointly form a society, and task will no longer distinguish between real and simulated
On the surface, Wild AI looks like making application, but because of the change in the nature of task, it differs from traditional vertical domain AI: it is not about collecting NLP/LLM-formatted data from the domain to continue training models, but about deploying AI into the real world and directly optimizing for the domain’s ultimate goals.
From a technical perspective, this can be seen as application with RL, but the exploration of RL in LLM is still long; how to accurately bring back feedback produced by the final objectives to the model may require changes in human-machine interaction patterns, changes in model frameworks, and even paradigms beyond RL.
In the most naive sense, this is still RLHF and not a completely new path. It is precisely this path that helped OpenAI launch ChatGPT, allowing large models to truly reach human users, and the effects brought by real users’ preference feedback later became widely known: they made large models truly understood by the world and opened a new LLM era. And now everyone will race along this path at an unprecedented speed.
Deep Goals
What are the deep and complex goals and feedback in human social activities? Possible (but not exhaustive) examples include:
- GDP
- employment rate
- average life expectancy
- crime rate
- annual profit
- cutoff score
- box office
- scientific discovery
- h-index
- global temperature
- ……
Why Does Wild AI Optimize for These Goals?
- One of reinforcement learning’s most utilitarian advantages is optimizing for non-differentiable objectives, and there exist too many non-differentiable gaps between the model and real-world utility
- People have discovered the potential of reinforcement learning plus strong priors; it may be possible to optimize for these goals
- Scaling needs a new story; from training to inference, the next stage of scaling requires new data and new dimensions. The infinitely many environments in the real world, extremely long chains, and goals that currently seem fantastical all provide new soil for that story
- The First Half of chasing dataset SOTA has become ineffective
- Large companies have limited patience for long-term investment in AI
Recommendation systems are a field that has long entered the Wild Era: they optimize for GMV, profoundly transform our lives, trigger countless discussions such as information cocoons, and together with mobile internet shape the current landscape of tech company giants. The research directions of academia and industry have gradually diverged. And LLMs/AI entering the Wild Era will go further than recommendation systems in every aspect (or put it another way, be more severe).
Welcome to the Era of Wild?
I will use the word Welcome cautiously. Goodhart’s law tells us that once a metric becomes a target it is no longer a good metric. The indicators currently defined for measuring human society already have many problems; even without AI’s participation, many pathological phenomena have already occurred in people’s pursuit of these indicators, and AI’s involvement may further accelerate this kind of hacking. In addition, from another perspective, certain individuals or organizations might also realize their objectives through Feedback Hijacking.
On the other hand, previously people were divided over whether AI should be developed first or governed first, but at present the development-first technical path has become somewhat clearer, and companies that develop first will not stop: AI’s transformation of the world will not pause because of debates. Under this premise, the technology for governing AI may actually need to evolve faster.
The Future
In the new Wild Era, what technologies are still needed to develop AI?
- Before moving into real society, first achieve truly reliable AI decision mechanisms, rather than still relying on probabilities
- A new cycle of four major components: model/algo/data/infras. Currently when people do Agentic AI, they are still working within well-cared-for datasets and environments. But upon reaching real environments, current model architectures may not be suitable for extremely long chains, algorithms cannot fully utilize feedback, and the base infrastructure cannot support large-scale efficient training and inference. And when model/data/base infrastructure are all optimized to saturation, people will further go into the real environment to mine the next data treasure, bringing more complex data situations, which in turn will prompt the evolution of model/algorithm/base infrastructure, forming a cycle
- Simulators: letting AI directly transform the real world is still somewhat radical; how to realistically simulate feedback and simulate the impact caused by AI is indispensable. Simulators can on the one hand conduct rehearsals before real transformation, and on the other hand scale up training, just like reward model
- New feedback curation standards: like pretraining data, people need various complex strategies to ensure data quality, avoiding bad feedback signals brought by next token. In the real world, feedback signals are more complex and also require more systematic vetting
- Hard to Verify: currently our progress is almost entirely on the Easy to Verify side within the Generator Verifier Asymmetry; after moving to the real world, we will flip to the other side, and how to mine and utilize feedback in the Hard to Verify domain is also a major topic
- Collective intelligence: after AI enters human society, countless entities paired with AI will emerge, the entire human society mirrored as an AI society. Will such bottom-up intelligent collectives give rise to new intelligence and new social mechanisms? Will this correspondingly require new evaluation systems?
- When optimizing for deep goals in real production activities, should evaluation and feedback be safely separated to avoid falling into the Goodhart trap?
- Bidirectional optimization: maybe we will not only optimize AI for society, but also optimize society for AI
- Mechanism-driven alignment: current alignment signals such as human preferences are still controlled by humans determining the direction of alignment; if preferences are inverted, the same process can make the model learn badly. In the future is it possible to set environmental mechanisms such that when models interact and experience in society and learn from feedback, no party can dominate the alignment signals, but rather such mechanisms achieve common goals
- World Sensor: a feedback collector designed for human social activities, bridging the environment and the model
This is an era of confusion, bubbles, temptation, risk, and opportunity coexisting.
(Welcome) to the Era of Wild.
Citation
If you found the topics in this blog post interesting and would like to cite it, you may use the following BibTeX entry:
1 | @article{wild_era_202510, |