What Problem Is Traditional RAG Solving?

What Problem Is Traditional RAG Solving?
Photo by Brian Patrick Tagalog / Unsplash

RAG solves a specific technical problem: it enables models trained on broad public text to answer questions about specialized content without retraining. But solving a technical problem doesn't automatically make RAG the right business solution.

A useful way to reason about search-system design is to model its net utility:

\begin{eqnarray*}
\text{Net Utility} & = & \text{Value from correct answers} \\
& & - \text{Cost of errors} \\
& & - \text{Cost of computation} \\
& & - \text{Cost of latency} \\
& & - \text{Cost of human effort}
\end{eqnarray*}

The terms interact. Increased computation can reduce error while raising latency. Latency reduces value through abandonment, user frustration, etc.

Different domains sit in different places on this tradeoff surface. High value and high error cost settings, such as medical or legal work, can justify slower and more expensive pipelines. High volume and error tolerant settings, such as intent routing or content recommendation, require speed and low cost.

Classical information retrieval optimized this by selecting evidence. Formally: given a corpus $C$, a query $q$, and a budget $B$ that includes time, money, and context length, select an evidence set $E \subset C$ that is sufficient for a correct decision while keeping cost and error low. Modern systems also optimize coverage without redundancy so that each additional passage adds new information inside a fixed window.

RAG systems add a generation layer that transforms retrieved evidence. Instead of just returning E, the system computes T(E)—a synthesized, contextualized response generated from the evidence. This fundamentally changes the utility calculation. Generation can increase both computational cost and latency while potentially increasing correctness and reducing user effort.

Deciding whether to use RAG involves two levels of analysis. First, feasibility: can any retrieval-plus-generation pipeline solve your problem at all? Second, optimization: which specific methods maximize net utility once feasibility is established? Traditional RAG—semantic retrieval via dense embeddings + LLM generation—is one point in a design space spanning from simple (classifiers, keyword search) to sophisticated (LLM-based retrieval).

Scope Conditions: When Retrieval via Dense Embeddings Works

Semantic retrieval succeeds when these conditions hold:

  1. Embedding quality. The model maps related ideas near one another and handles paraphrase and lexical variation.
  2. Chunk coherence. Chunks carry self‑contained meaning yet remain focused so embeddings are not diluted by unrelated topics.
  3. Query specificity. The query is informative enough to point at the right region of the corpus.
  4. Low dispersion. The answer resides in a small number of places rather than being scattered across many documents.
  5. Minimal dependence on non-semantic features. Semantic similarity is a good proxy for relevance. Reliance on non-semantic features like permissions, versioning, or jurisdictional filters, can be easily added using filters.
  6. Prose‑based knowledge. The information lives mainly in paragraphs rather than structured tables or code.

Scope Conditions: When Generation Works

Retrieved chunks are comprehensible to the LLM. Chunks must provide enough context for the LLM to understand their meaning. Text fragments comprehensible to humans familiar with the domain may be incomprehensible to LLMs without broader context. Technical jargon, domain-specific abbreviations, or references to unstated context can make chunks unusable even when retrieval finds the right content.

Chunks provide reasoning-enabling context. Beyond basic comprehension, chunks must contain information that enables the reasoning task. If the task requires understanding relationships between entities and those relationships are only implicit, or if conditional logic spans multiple sources and chunks capture individual rules but not their interactions, the LLM cannot perform its function even when it understands individual chunks.

The transformation task is within LLM capabilities. The reasoning or synthesis required must be something LLMs can actually do. Reformatting, conditional logic assembly, and synthesis across sources are within current capabilities. Complex multi-step reasoning chains, precise numerical calculations, or tasks requiring external knowledge LLMs don't have may not be.

Cost-Benefit Analysis: Which Methods Win?

Even when RAG satisfies all scope conditions and can deliver value, it may impose unacceptable costs. The computational cost, latency cost, and operating cost terms in our utility equation often favor simpler alternatives.

Do You Need Retrieval?

Before comparing retrieval methods, establish whether retrieval is necessary. Many problems are better solved without retrieval architecture at all. For instance, for a class of problems, classifiers may be the right solutions. If embeddings are expressive enough that nearest neighbors find the right content for semantic retrieval, those same embeddings are expressive enough for a thin classifier trained on frozen vectors. A softmax head over frozen embeddings is cheap to train, easy to evaluate, and fast to run. Classifiers also provide lower computational cost, lower latency, simpler infrastructure, easier evaluation, etc.

When new intents emerge, adding a class and collecting examples may be comparable to or easier than engineering new retrieval patterns and tuning generation prompts. The "no training data" advantage of RAG often overstates the difficulty of obtaining hundreds of examples per intent.

The case for retrieval over classification rests on the long tail argument: retrieval theoretically handles unlimited query variations without predefined categories. But this advantage only materializes when query traffic is genuinely diverse and unpredictable. In practice, traffic patterns in most production systems are heavily concentrated.

Retrieval Method Selection

Use sparse retrieval, hybrid retrieval, LLM-based retrieval, etc. based on problem type.

Generation Method Selection

LLM-based generation adds value beyond simple retrieval when:

  1. Semantic integration. Multiple passages state the same fact in different language. Generation produces one coherent statement with citations.
  2. Conditional assembly. Rules and exceptions are distributed across documents. Generation assembles them into explicit logic.
  3. Contradiction surfacing. Sources disagree. Retrieval can list them. Generation adds value by normalizing terminology, aligning time frames, and presenting the disagreement in a common frame with citations. The goal is to make the conflict visible and comparable, not to decide truth.
  4. Justified conclusions. Some workflows require an explanation tied to the user’s context. For example, “Version 2.1.1 falls within the affected range in CVE‑2024‑XXXX,” with links to the supporting evidence.

Subscribe to Gojiberries

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe