RAG TechniquesOverview

Techniques Overview

Forge V5 integrates 14 retrieval-augmented generation techniques into a single pipeline. Each one solves a specific failure mode that simpler RAG systems hit.

The Full Technique Table

#TechniqueWhat It DoesQuality ImpactLatency ImpactStage
1Agentic RAGLLM-driven retrieval loop with tool selectionVery High+3-8sQuery
2CRAG Quality GateCross-encoder filters irrelevant retrievalsHigh+200msQuery
3Multi-Hop ReasoningDecomposes complex queries into sub-queriesHigh+2-5sQuery
4Contextual RetrievalLLM-generated context prefix per chunkVery High+0ms (query)Ingestion
5Proposition IndexingAtomic claims extracted and indexedHigh+0ms (query)Ingestion
6Hierarchical 4-LevelL0-L3 document hierarchy with parent expansionHigh+0ms (query)Ingestion
7BGE-M3 Tri-ModalDense + sparse + ColBERT vectors from one modelVery High~50msBoth
8ColBERT RerankingToken-level MaxSim rerankingHigh+100msQuery
9Knowledge GraphEntity/relationship extraction and traversalMedium-High+100msBoth
10Self-VerificationClaim-by-claim audit against sourcesHigh+500msQuery
11Confidence ScoringWeighted retrieval confidence signalsMedium+0msQuery
12Query DecompositionComplex questions split into atomic sub-queriesHigh+200msQuery
13HyDEHypothetical document embeddings for better recallMedium+300msQuery
14Parent ExpansionReturn parent section when child chunk matchesMedium+0msQuery
Ingestion vs. Query cost

Techniques 4, 5, 6, and 9 add cost at ingestion time (when documents are uploaded), not at query time. This is a deliberate design choice: invest compute once during ingestion so every subsequent query is faster and more accurate.

Why 14 Techniques?

Each technique addresses a specific failure mode:

Failure ModeWhat Goes WrongTechnique That Fixes It
Semantic gapQuery and relevant chunk use different wordsHyDE, BGE-M3 sparse vectors
Lost in the middleRelevant info buried in a long chunkProposition Indexing
No contextChunk is ambiguous without surrounding textContextual Retrieval, Parent Expansion
Wrong granularityQuery needs a section but search returns a sentenceHierarchical 4-Level
Irrelevant retrievalTop-k results don’t actually answer the questionCRAG Quality Gate, ColBERT Reranking
Single-hop limitAnswer requires combining info from multiple placesMulti-Hop Reasoning, Agentic RAG
HallucinationLLM generates claims not in the sourcesSelf-Verification
Keyword missDense vectors miss exact terms and namesBGE-M3 sparse vectors
Relationship queries”Who authorizes X?” needs graph structureKnowledge Graph
Static pipelineOne-size-fits-all retrieval strategyAgentic RAG (adaptive tool selection)

No single technique fixes all of these. That’s why Forge combines them.

How They Compose: The Pipeline

Ingestion Pipeline (Document Upload)

Document Upload


  ┌──────────┐
  │  Parse    │  PDF/DOCX/TXT → raw text
  └────┬─────┘


  ┌──────────────────┐
  │  Hierarchical     │  Split into L0 (doc) → L1 (section) → L2 (chunk)
  │  Chunking         │
  └────┬─────────────┘

       ├──────────────────────────────┐
       ▼                              ▼
  ┌──────────────────┐    ┌───────────────────────┐
  │  Contextual       │    │  Proposition           │
  │  Enrichment       │    │  Extraction (L3)       │
  │  (per chunk)      │    │  (Dense-X)             │
  └────┬─────────────┘    └────┬──────────────────┘
       │                       │
       ├───────────────────────┘

       ├──────────────────────────────┐
       ▼                              ▼
  ┌──────────────────┐    ┌───────────────────────┐
  │  Knowledge Graph  │    │  BGE-M3 Embedding      │
  │  Extraction       │    │  (dense + sparse +     │
  │  (entities +      │    │   ColBERT vectors)     │
  │   relationships)  │    └────┬──────────────────┘
  └────┬─────────────┘         │
       │                       │
       ▼                       ▼
  ┌──────────┐         ┌──────────┐
  │  Redis   │         │  Qdrant  │
  │  (graph  │         │  (all    │
  │  adj.)   │         │  vectors)│
  └──────────┘         └──────────┘

Query Pipeline (Agentic Mode)

User Query


  ┌─────────────────┐
  │  Query Analysis  │  Classify complexity, detect multi-hop needs
  └────┬────────────┘


  ┌─────────────────────────────────────────┐
  │            LangGraph Agent Loop          │
  │                                         │
  │   Iteration 1:                          │
  │   ┌──────────────┐                      │
  │   │ Select Tool   │ → semantic_search   │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ Execute Tool  │ → 8 chunks found    │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ CRAG Gate    │ → 5 correct,         │
  │   │              │   2 ambiguous,       │
  │   │              │   1 incorrect        │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ ColBERT       │ → Rerank to top 5   │
  │   │ Rerank        │                     │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ Reflect       │ → "Need more info   │
  │   │              │    on methodology"   │
  │   └──────┬───────┘                      │
  │          │                              │
  │   Iteration 2:                          │
  │   ┌──────────────┐                      │
  │   │ Select Tool   │ → proposition_search│
  │   └──────┬───────┘                      │
  │          ▼                              │
  │         ...                             │
  │                                         │
  │   Final: "Evidence sufficient"          │
  └────┬────────────────────────────────────┘


  ┌─────────────────┐
  │  Generate Answer │  LLM synthesizes from all gathered evidence
  └────┬────────────┘


  ┌─────────────────┐
  │  Self-Verify     │  Claim-by-claim audit against source chunks
  └────┬────────────┘


  ┌─────────────────┐
  │  Stream to UI    │  SSE events: tokens, sources, confidence
  └─────────────────┘

Direct Mode (Simplified)

Direct mode skips the agent loop for faster responses:

Query → BGE-M3 Embed → Qdrant Search → CRAG Gate → ColBERT Rerank → Generate → Stream

No iteration, no tool selection, no reflection. Useful for simple factual queries where a single retrieval pass is sufficient.

Technique Dependencies

Some techniques depend on others:

Contextual Retrieval  →  requires LLM (during ingestion)
Proposition Indexing  →  requires LLM (during ingestion)
Knowledge Graph       →  requires LLM (during ingestion)
ColBERT Reranking     →  requires BGE-M3 ColBERT vectors
Parent Expansion      →  requires Hierarchical Indexing
Agentic RAG           →  orchestrates all other query-time techniques
CRAG Quality Gate     →  requires cross-encoder model (separate from LLM)
Self-Verification     →  requires LLM (during query)
HyDE                  →  requires LLM + BGE-M3 (during query)

Disabling a technique in config.yml automatically disables anything that depends on it. The system degrades gracefully — you can run Forge with just BGE-M3 + direct mode and still get good results, then enable techniques incrementally.

Performance Impact

Benchmarked on a 500-page technical manual, RTX 4080 (16GB):

ConfigurationQuery Latency (p50)Answer Quality (human eval)
BGE-M3 only (no techniques)1.2s6.2/10
+ CRAG + ColBERT1.8s7.5/10
+ Contextual + Propositions1.9s8.3/10
+ Hierarchical + Graph2.1s8.7/10
Full agentic (all 14)7.4s9.4/10

The agentic mode is slower but dramatically more accurate, especially for complex questions that require evidence from multiple document sections.

Next Steps

Dive into individual techniques: