
What are Recursive Language Models (RLMs)?
A recent paper from MIT CSAIL researchers Alex L. Zhang, Tim Kraska, and Omar Khattab introduces Recursive Language Models (RLMs) as a solution. Published as a preprint in late 2025 and updated in January 2026, RLMs enable models to handle inputs far beyond standard context limits, often with better accuracy and comparable or lower cost.
Large language models excel at processing text, but their performance often degrades as input length increases. This issue, known as context rot, causes models to lose track of details in long prompts, even within their specified context windows.
A recent paper from MIT CSAIL researchers Alex L. Zhang, Tim Kraska, and Omar Khattab introduces Recursive Language Models (RLMs) as a solution. Published as a preprint in late 2025 and updated in January 2026, RLMs enable models to handle inputs far beyond standard context limits, often with better accuracy and comparable or lower cost.
This article explains RLMs for both technical and non-technical audiences, highlights community reactions from recent discussions, and covers benchmark results.
Challenge with long context
Most large language models have fixed context windows, typically ranging from thousands to millions of tokens. Beyond a certain length, models struggle with dense or multi-hop tasks: early information gets diluted, and recall suffers.
Most large language models have fixed context windows, typically ranging from thousands to millions of tokens. Beyond a certain length, models struggle with dense or multi-hop tasks: early information gets diluted, and recall suffers.
Non-technical analogy: Imagine reading a very long book and answering detailed questions about it. You might remember the beginning and end clearly, but details from the middle fade.
Technical perspective: Standard approaches like retrieval-augmented generation (RAG) or summarization help, but they can lose fidelity or require pre-processing. Direct long-context models still degrade on complex queries.
How Recursive Language Models Work
RLMs change the paradigm by treating the input prompt as an external object in a programmable environment, usually a Python REPL.Core process:The full input is loaded as a variable in the REPL, not directly into the model’s context. The model generates code to inspect, search, or slice the input (e.g., using regex or chunking). It recursively calls itself on subsections, storing results symbolically in variables. The root model aggregates outcomes without ever loading the entire input at once.
Non-technical analogy: Instead of reading an entire library, you program a system to search for keywords, pull relevant sections, and have assistants summarize them before combining insights.
Technical details: RLMs enable symbolic recursion, where sub-calls return values to variables rather than bloating the context. This supports near-unbounded inputs (demonstrated up to 10M+ tokens) and avoids autoregressive output limits.
As Alex Zhang describes:“We propose Recursive Language Models, or RLMs, a general inference strategy where language models can decompose and recursively interact with their input context as a variable.”
Community Discussions and Early Adoption
Since the paper’s release, discussions have focused on RLMs’ distinctions from existing methods.
Developer advocate Leonie provided a clear breakdown:
“The text exists only as a variable in a Python environment (REPL). The model never sees the text unless it explicitly writes code to print a snippet of it.”
Debates compare RLMs to coding agents.
Co-author Omar Khattab clarified key differences: “The user prompt P itself (not just external data) is a symbolic object in the environment… recursion must happen during code execution… All sub-calls and tool calls return values into symbolic variables.”
The following are not standard in a coding agent:
— Omar Khattab (@lateinteraction) February 7, 2026
1. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P.
2. The model has to write recursive code (that calls LMs) to understand or… pic.twitter.com/DzPIcfHAuA
Alex Zhang released an open-source implementation:“pip install rlms”
Maybe I can provide some intuition, but lmk if it’s unclear — I am trying to refine how I explain this anyways!
— Alex L Zhang (@a1zhang) February 8, 2026
To start, I think the RLM idea is super simple but elegant (I'm biased obviously). The paper argues that future “language models” 1) do not need to think about context… https://t.co/cbHglMQFQf
Prime Intellect described RLMs as a potential paradigm for 2026, emphasizing their role in long-horizon agent tasks.
Early users report mixed results, with some noting efficiency on legal analysis but risks of excessive recursion.
Benchmark Performance
Results show RLMs outperforming base models and alternatives:
| Task | Context Length | Base GPT-5 | RLM (GPT-5) | Notes |
|---|---|---|---|---|
| CodeQA | 23K–4.2M tokens | 24.0% | 62.0% | Multi-choice code understanding |
| BrowseComp+ | 6M–11M tokens | 0.0% | 91.3% | Information aggregation |
| OOLONG | 131K tokens | 44.0% | 56.5% | Semantic tasks |
| OOLONG-Pairs | 32K tokens | 0.1% | 58.0% | Quadratic complexity pairing |
RLMs often achieve these gains at similar or lower cost, with smaller models (e.g., GPT-5-mini) outperforming larger base versions.
A fine-tuned 8B model, RLM-Qwen3-8B, improved 28.3% on average over its base.
RLMS Implications
RLMs represent an inference-time scaling approach, aligning with trends toward more compute at test time for better results. They handle dense, long inputs effectively, with applications in code analysis, document reasoning, and aggregation tasks.Challenges remain, including high variance in sub-call usage and potential inefficiencies in weaker models. Optimizations like asynchronous calls or reinforcement learning fine-tuning are proposed.
Code is available at https://github.com/alexzhang13/rlm.
Sources:
- arXiv:2512.24601
- Alex Zhang’s blog post
- [Prime Intellect analysis
- ](https://www.primeintellect.ai/blog/rlm)
- Community discussions on X
