Back to all posts
rlms.jpg
Ayoub Tabout 5 min read

What are Recursive Language Models (RLMs)?

A recent paper from MIT CSAIL researchers Alex L. Zhang, Tim Kraska, and Omar Khattab introduces Recursive Language Models (RLMs) as a solution. Published as a preprint in late 2025 and updated in January 2026, RLMs enable models to handle inputs far beyond standard context limits, often with better accuracy and comparable or lower cost.

Large language models excel at processing text, but their performance often degrades as input length increases. This issue, known as context rot, causes models to lose track of details in long prompts, even within their specified context windows.

A recent paper from MIT CSAIL researchers Alex L. Zhang, Tim Kraska, and Omar Khattab introduces Recursive Language Models (RLMs) as a solution. Published as a preprint in late 2025 and updated in January 2026, RLMs enable models to handle inputs far beyond standard context limits, often with better accuracy and comparable or lower cost.

This article explains RLMs for both technical and non-technical audiences, highlights community reactions from recent discussions, and covers benchmark results.

Challenge with long context

Most large language models have fixed context windows, typically ranging from thousands to millions of tokens. Beyond a certain length, models struggle with dense or multi-hop tasks: early information gets diluted, and recall suffers.

Most large language models have fixed context windows, typically ranging from thousands to millions of tokens. Beyond a certain length, models struggle with dense or multi-hop tasks: early information gets diluted, and recall suffers.

Non-technical analogy: Imagine reading a very long book and answering detailed questions about it. You might remember the beginning and end clearly, but details from the middle fade.

Technical perspective: Standard approaches like retrieval-augmented generation (RAG) or summarization help, but they can lose fidelity or require pre-processing. Direct long-context models still degrade on complex queries.

How Recursive Language Models Work

RLMs change the paradigm by treating the input prompt as an external object in a programmable environment, usually a Python REPL.Core process:The full input is loaded as a variable in the REPL, not directly into the model’s context. The model generates code to inspect, search, or slice the input (e.g., using regex or chunking). It recursively calls itself on subsections, storing results symbolically in variables. The root model aggregates outcomes without ever loading the entire input at once.

Non-technical analogy: Instead of reading an entire library, you program a system to search for keywords, pull relevant sections, and have assistants summarize them before combining insights.

Technical details: RLMs enable symbolic recursion, where sub-calls return values to variables rather than bloating the context. This supports near-unbounded inputs (demonstrated up to 10M+ tokens) and avoids autoregressive output limits.

As Alex Zhang describes:“We propose Recursive Language Models, or RLMs, a general inference strategy where language models can decompose and recursively interact with their input context as a variable.”

Community Discussions and Early Adoption

Since the paper’s release, discussions have focused on RLMs’ distinctions from existing methods.

Developer advocate Leonie provided a clear breakdown:

“The text exists only as a variable in a Python environment (REPL). The model never sees the text unless it explicitly writes code to print a snippet of it.”

Debates compare RLMs to coding agents.

Co-author Omar Khattab clarified key differences: “The user prompt P itself (not just external data) is a symbolic object in the environment… recursion must happen during code execution… All sub-calls and tool calls return values into symbolic variables.”

Alex Zhang released an open-source implementation:“pip install rlms”

Prime Intellect described RLMs as a potential paradigm for 2026, emphasizing their role in long-horizon agent tasks.

Early users report mixed results, with some noting efficiency on legal analysis but risks of excessive recursion.

Benchmark Performance

Results show RLMs outperforming base models and alternatives:

Task Context Length Base GPT-5 RLM (GPT-5) Notes
CodeQA 23K–4.2M tokens 24.0% 62.0% Multi-choice code understanding
BrowseComp+ 6M–11M tokens 0.0% 91.3% Information aggregation
OOLONG 131K tokens 44.0% 56.5% Semantic tasks
OOLONG-Pairs 32K tokens 0.1% 58.0% Quadratic complexity pairing

RLMs often achieve these gains at similar or lower cost, with smaller models (e.g., GPT-5-mini) outperforming larger base versions.

A fine-tuned 8B model, RLM-Qwen3-8B, improved 28.3% on average over its base.

RLMS Implications

RLMs represent an inference-time scaling approach, aligning with trends toward more compute at test time for better results. They handle dense, long inputs effectively, with applications in code analysis, document reasoning, and aggregation tasks.Challenges remain, including high variance in sub-call usage and potential inefficiencies in weaker models. Optimizations like asynchronous calls or reinforcement learning fine-tuning are proposed.

Code is available at https://github.com/alexzhang13/rlm.

Sources:

Tell us about your use case.

By contacting us you agree to our terms and privacy policy.