The RAG Plateau
Retrieval-Augmented Generation has been the dominant paradigm for extending LLM capabilities beyond their training data. The formula is straightforward: embed your documents, retrieve relevant chunks at query time, inject them into the prompt, generate a response.
It works. Until it doesn't.
RAG hits fundamental limitations when tasks require:
- ● Multi-hop reasoning across disparate document sections
- ● Iterative exploration where each answer informs the next question
- ● Synthesis of information scattered across hundreds of pages
- ● Contextual understanding that emerges only from seeing the whole picture
These aren't edge cases. They're the interesting problems.
Enter Recursive Language Models
The shift from RAG to RLM isn't incremental improvement. It's a different paradigm entirely.
Where RAG retrieves and injects, RLM explores and reasons. The model doesn't receive pre-selected chunks. It writes code to navigate the information space, decides what to examine, and recursively breaks complex queries into sub-problems.
Think of the difference between being handed a stack of highlighted paragraphs versus being given access to a library with a skilled research assistant who can follow leads.
Why Recursion Changes Everything
Adaptive Depth
RAG systems have fixed retrieval: top-k chunks, done. RLM adapts its exploration depth to the problem. Simple queries get quick answers. Complex queries spawn recursive sub-calls that each handle a piece of the puzzle.
// RLM automatically adjusts exploration depth
const result = await rlm.completion(
"Compare the pricing models across all competitor analyses in this document set",
competitorReports // 50 documents, 2000+ pages
);
// Model recursively processes each report, extracts pricing, synthesizes comparisonReasoning Chains
RAG concatenates retrieved text. RLM builds reasoning chains. Each step can examine different parts of the context, and the model maintains state across the chain.
// RLM can reason across steps
"First, identify all API endpoints.
Then, for each endpoint, find its authentication requirements.
Finally, flag any that allow unauthenticated access."This isn't prompt engineering. The model actually executes these steps, examining different parts of the codebase at each stage.
Self-Correction
When a RAG system retrieves the wrong chunks, you get a confidently wrong answer. RLM can recognise when initial exploration didn't find what it needed and try different approaches.
The model writes code like:
let results = search(context, "authentication");
if (results.length === 0) {
results = search(context, "auth OR login OR session");
}This adaptability emerges naturally from the recursive architecture.
Head-to-Head: RAG vs RLM
| Aspect | RAG | RLM |
|---|---|---|
| Context handling | Fixed retrieval window | Unlimited via exploration |
| Reasoning depth | Single-pass | Recursive multi-step |
| Adaptability | Static pipeline | Dynamic based on task |
| Complex queries | Degrades with complexity | Scales with recursion |
| Failure mode | Wrong chunks = wrong answer | Can self-correct |
| Latency | Fast (one retrieval) | Variable (task-dependent) |
| Cost | Predictable | Proportional to complexity |
When to Use Each
RAG remains excellent for:
- ● Simple factual queries
- ● High-volume, low-complexity requests
- ● Latency-critical applications
- ● Well-structured knowledge bases
RLM shines when:
- ● Questions require reasoning across multiple sources
- ● The "right" information isn't known until you start exploring
- ● Tasks involve analysis, comparison, or synthesis
- ● Documents are complex or poorly structured
The Technical Shift
Implementing this shift requires rethinking your architecture:
From Embeddings to Exploration
RAG lives and dies by embedding quality. RLM shifts the burden to the model's ability to write effective exploration code.
// RAG: hope your embeddings capture semantic similarity
const chunks = await vectorStore.similaritySearch(query, 5);
// RLM: model decides how to explore
const result = await rlm.completion(query, fullDocument);
// Model might search, filter, aggregate, or recursively decomposeFrom Retrieval to Reasoning
The RLM sandbox provides tools the model can use:
search(context, query)- semantic search within contextfilter(context, predicate)- extract matching sectionschunk(context, size)- split for recursive processingrlm.subQuery(question, context)- spawn recursive calls
The model combines these as needed, building reasoning strategies on the fly.
From Pipeline to Agent
RAG is a pipeline: embed → retrieve → generate.
RLM is closer to an agent: observe → plan → act → repeat.
This agent-like behaviour means RLM can handle queries that would require multiple RAG calls with manual orchestration.
Real Results
We've seen RLM outperform RAG significantly on complex tasks:
Codebase Analysis: "Find all places where user input reaches a database query without sanitisation"
- RAG: Retrieved files mentioning "database" and "input". Missed indirect paths.
- RLM: Traced data flow across files, identified 3 direct and 2 indirect vulnerabilities.
Legal Document Review: "Compare indemnification clauses across these 12 contracts"
- RAG: Retrieved sections containing "indemnification". Couldn't synthesise comparison.
- RLM: Extracted each clause, normalised terminology, produced structured comparison.
Research Synthesis: "What methodological criticisms have been raised against studies using this technique?"
- RAG: Found papers mentioning the technique. Couldn't identify which were critical.
- RLM: Classified papers by stance, extracted specific methodological concerns, cited sources.
The Path Forward
RAG isn't going away. It's fast, predictable, and sufficient for many use cases. But the ceiling is visible.
Recursive approaches like RLM represent the next step: models that don't just retrieve information but actively reason over it. As models become more capable at code generation and self-reflection, this gap will widen.
The question isn't whether to adopt recursive architectures, but when.
Try It Yourself
RLM is open source and ready to use. If you've hit the limits of RAG, it's worth exploring what recursion can do.
git clone https://github.com/hampton-io/RLM.git
cd RLM && npm install && npm run buildThe context window barrier is dissolving. The question is what you'll build when it's gone.