What are Recursive Language Models (RLMs)?

Large language models excel at processing text, but their performance often degrades as input length increases. This issue, known as context rot, causes models to lose track of details in long prompts, even within their specified context windows.

A recent paper from MIT CSAIL researchers Alex L. Zhang, Tim Kraska, and Omar Khattab introduces Recursive Language Models (RLMs) as a solution. Published as a preprint in late 2025 and updated in January 2026, RLMs enable models to handle inputs far beyond standard context limits, often with better accuracy and comparable or lower cost.

This article explains RLMs for both technical and non-technical audiences, highlights community reactions from recent discussions, and covers benchmark results.

Challenge with long context

Most large language models have fixed context windows, typically ranging from thousands to millions of tokens. Beyond a certain length, models struggle with dense or multi-hop tasks: early information gets diluted, and recall suffers.

Non-technical analogy: Imagine reading a very long book and answering detailed questions about it. You might remember the beginning and end clearly, but details from the middle fade.

Technical perspective: Standard approaches like retrieval-augmented generation (RAG) or summarization help, but they can lose fidelity or require pre-processing. Direct long-context models still degrade on complex queries.

How Recursive Language Models Work

RLMs change the paradigm by treating the input prompt as an external object in a programmable environment, usually a Python REPL.Core process:The full input is loaded as a variable in the REPL, not directly into the model's context. The model generates code to inspect, search, or slice the input (e.g., using regex or chunking). It recursively calls itself on subsections, storing results symbolically in variables. The root model aggregates outcomes without ever loading the entire input at once.

Non-technical analogy: Instead of reading an entire library, you program a system to search for keywords, pull relevant sections, and have assistants summarize them before combining insights.

Technical details: RLMs enable symbolic recursion, where sub-calls return values to variables rather than bloating the context. This supports near-unbounded inputs (demonstrated up to 10M+ tokens) and avoids autoregressive output limits.

As Alex Zhang describes:"We propose Recursive Language Models, or RLMs, a general inference strategy where language models can decompose and recursively interact with their input context as a variable."

Community Discussions and Early Adoption

Since the paper's release, discussions have focused on RLMs' distinctions from existing methods.

Developer advocate Leonie provided a clear breakdown:

"The text exists only as a variable in a Python environment (REPL). The model never sees the text unless it explicitly writes code to print a snippet of it."

Debates compare RLMs to coding agents.

Co-author Omar Khattab clarified key differences: "The user prompt P itself (not just external data) is a symbolic object in the environment... recursion must happen during code execution... All sub-calls and tool calls return values into symbolic variables."

The following are not standard in a coding agent:

1. The user prompt P itself (not just external data) is a symbolic object in the environment. The model is not allowed to grep/read long snippets from P.

2. The model has to write recursive code (that calls LMs) to understand or… pic.twitter.com/DzPIcfHAuA
— Omar Khattab (@lateinteraction) February 7, 2026

Alex Zhang released an open-source implementation:"pip install rlms"

Maybe I can provide some intuition, but lmk if it’s unclear — I am trying to refine how I explain this anyways!

To start, I think the RLM idea is super simple but elegant (I'm biased obviously). The paper argues that future “language models” 1) do not need to think about context… https://t.co/cbHglMQFQf
— Alex L Zhang (@a1zhang) February 8, 2026

Prime Intellect described RLMs as a potential paradigm for 2026, emphasizing their role in long-horizon agent tasks.

Early users report mixed results, with some noting efficiency on legal analysis but risks of excessive recursion.

Benchmark Performance

Results show RLMs outperforming base models and alternatives:

Task	Context Length	Base GPT-5	RLM (GPT-5)	Notes
CodeQA	23K–4.2M tokens	24.0%	62.0%	Multi-choice code understanding
BrowseComp+	6M–11M tokens	0.0%	91.3%	Information aggregation
OOLONG	131K tokens	44.0%	56.5%	Semantic tasks
OOLONG-Pairs	32K tokens	0.1%	58.0%	Quadratic complexity pairing

RLMs often achieve these gains at similar or lower cost, with smaller models (e.g., GPT-5-mini) outperforming larger base versions.

A fine-tuned 8B model, RLM-Qwen3-8B, improved 28.3% on average over its base.

RLMS Implications

RLMs represent an inference-time scaling approach, aligning with trends toward more compute at test time for better results. They handle dense, long inputs effectively, with applications in code analysis, document reasoning, and aggregation tasks.Challenges remain, including high variance in sub-call usage and potential inefficiencies in weaker models. Optimizations like asynchronous calls or reinforcement learning fine-tuning are proposed.

Code is available at https://github.com/alexzhang13/rlm.

Sources:

arXiv:2512.24601
Alex Zhang's blog post
[Prime Intellect analysis
](https://www.primeintellect.ai/blog/rlm)
Community discussions on X

What are Recursive Language Models (RLMs)?

Challenge with long context

How Recursive Language Models Work

Community Discussions and Early Adoption

Benchmark Performance

RLMS Implications

Other Posts

Knowledge Distillation in Large Language Models

Reinforcement Learning for Long-Horizon Tasks

Inter-Annotator Agreement in Multi-Annotator Labeling Explained

Tell us about
your use case.

Safeguards

Expertise + Speed

Automated Sourcing

What are Recursive Language Models (RLMs)?

Challenge with long context

How Recursive Language Models Work

Community Discussions and Early Adoption

Benchmark Performance

RLMS Implications

Other Posts

Knowledge Distillation in Large Language Models

Reinforcement Learning for Long-Horizon Tasks

Inter-Annotator Agreement in Multi-Annotator Labeling Explained

Tell us about your use case.

Safeguards

Expertise + Speed

Automated Sourcing

Tell us about
your use case.