Techniques Overview

Forge V5 integrates 14 retrieval-augmented generation techniques into a single pipeline. Each one solves a specific failure mode that simpler RAG systems hit.

The Full Technique Table

#	Technique	What It Does	Quality Impact	Latency Impact	Stage
1	Agentic RAG	LLM-driven retrieval loop with tool selection	Very High	+3-8s	Query
2	CRAG Quality Gate	Cross-encoder filters irrelevant retrievals	High	+200ms	Query
3	Multi-Hop Reasoning	Decomposes complex queries into sub-queries	High	+2-5s	Query
4	Contextual Retrieval	LLM-generated context prefix per chunk	Very High	+0ms (query)	Ingestion
5	Proposition Indexing	Atomic claims extracted and indexed	High	+0ms (query)	Ingestion
6	Hierarchical 4-Level	L0-L3 document hierarchy with parent expansion	High	+0ms (query)	Ingestion
7	BGE-M3 Tri-Modal	Dense + sparse + ColBERT vectors from one model	Very High	~50ms	Both
8	ColBERT Reranking	Token-level MaxSim reranking	High	+100ms	Query
9	Knowledge Graph	Entity/relationship extraction and traversal	Medium-High	+100ms	Both
10	Self-Verification	Claim-by-claim audit against sources	High	+500ms	Query
11	Confidence Scoring	Weighted retrieval confidence signals	Medium	+0ms	Query
12	Query Decomposition	Complex questions split into atomic sub-queries	High	+200ms	Query
13	HyDE	Hypothetical document embeddings for better recall	Medium	+300ms	Query
14	Parent Expansion	Return parent section when child chunk matches	Medium	+0ms	Query

Ingestion vs. Query cost

Techniques 4, 5, 6, and 9 add cost at ingestion time (when documents are uploaded), not at query time. This is a deliberate design choice: invest compute once during ingestion so every subsequent query is faster and more accurate.

Why 14 Techniques?

Each technique addresses a specific failure mode:

Failure Mode	What Goes Wrong	Technique That Fixes It
Semantic gap	Query and relevant chunk use different words	HyDE, BGE-M3 sparse vectors
Lost in the middle	Relevant info buried in a long chunk	Proposition Indexing
No context	Chunk is ambiguous without surrounding text	Contextual Retrieval, Parent Expansion
Wrong granularity	Query needs a section but search returns a sentence	Hierarchical 4-Level
Irrelevant retrieval	Top-k results don’t actually answer the question	CRAG Quality Gate, ColBERT Reranking
Single-hop limit	Answer requires combining info from multiple places	Multi-Hop Reasoning, Agentic RAG
Hallucination	LLM generates claims not in the sources	Self-Verification
Keyword miss	Dense vectors miss exact terms and names	BGE-M3 sparse vectors
Relationship queries	”Who authorizes X?” needs graph structure	Knowledge Graph
Static pipeline	One-size-fits-all retrieval strategy	Agentic RAG (adaptive tool selection)

No single technique fixes all of these. That’s why Forge combines them.

How They Compose: The Pipeline

Ingestion Pipeline (Document Upload)

Document Upload
      │
      ▼
  ┌──────────┐
  │  Parse    │  PDF/DOCX/TXT → raw text
  └────┬─────┘
       │
       ▼
  ┌──────────────────┐
  │  Hierarchical     │  Split into L0 (doc) → L1 (section) → L2 (chunk)
  │  Chunking         │
  └────┬─────────────┘
       │
       ├──────────────────────────────┐
       ▼                              ▼
  ┌──────────────────┐    ┌───────────────────────┐
  │  Contextual       │    │  Proposition           │
  │  Enrichment       │    │  Extraction (L3)       │
  │  (per chunk)      │    │  (Dense-X)             │
  └────┬─────────────┘    └────┬──────────────────┘
       │                       │
       ├───────────────────────┘
       │
       ├──────────────────────────────┐
       ▼                              ▼
  ┌──────────────────┐    ┌───────────────────────┐
  │  Knowledge Graph  │    │  BGE-M3 Embedding      │
  │  Extraction       │    │  (dense + sparse +     │
  │  (entities +      │    │   ColBERT vectors)     │
  │   relationships)  │    └────┬──────────────────┘
  └────┬─────────────┘         │
       │                       │
       ▼                       ▼
  ┌──────────┐         ┌──────────┐
  │  Redis   │         │  Qdrant  │
  │  (graph  │         │  (all    │
  │  adj.)   │         │  vectors)│
  └──────────┘         └──────────┘

Query Pipeline (Agentic Mode)

User Query
      │
      ▼
  ┌─────────────────┐
  │  Query Analysis  │  Classify complexity, detect multi-hop needs
  └────┬────────────┘
       │
       ▼
  ┌─────────────────────────────────────────┐
  │            LangGraph Agent Loop          │
  │                                         │
  │   Iteration 1:                          │
  │   ┌──────────────┐                      │
  │   │ Select Tool   │ → semantic_search   │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ Execute Tool  │ → 8 chunks found    │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ CRAG Gate    │ → 5 correct,         │
  │   │              │   2 ambiguous,       │
  │   │              │   1 incorrect        │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ ColBERT       │ → Rerank to top 5   │
  │   │ Rerank        │                     │
  │   └──────┬───────┘                      │
  │          ▼                              │
  │   ┌──────────────┐                      │
  │   │ Reflect       │ → "Need more info   │
  │   │              │    on methodology"   │
  │   └──────┬───────┘                      │
  │          │                              │
  │   Iteration 2:                          │
  │   ┌──────────────┐                      │
  │   │ Select Tool   │ → proposition_search│
  │   └──────┬───────┘                      │
  │          ▼                              │
  │         ...                             │
  │                                         │
  │   Final: "Evidence sufficient"          │
  └────┬────────────────────────────────────┘
       │
       ▼
  ┌─────────────────┐
  │  Generate Answer │  LLM synthesizes from all gathered evidence
  └────┬────────────┘
       │
       ▼
  ┌─────────────────┐
  │  Self-Verify     │  Claim-by-claim audit against source chunks
  └────┬────────────┘
       │
       ▼
  ┌─────────────────┐
  │  Stream to UI    │  SSE events: tokens, sources, confidence
  └─────────────────┘

Direct Mode (Simplified)

Direct mode skips the agent loop for faster responses:

Query → BGE-M3 Embed → Qdrant Search → CRAG Gate → ColBERT Rerank → Generate → Stream

No iteration, no tool selection, no reflection. Useful for simple factual queries where a single retrieval pass is sufficient.

Technique Dependencies

Some techniques depend on others:

Contextual Retrieval  →  requires LLM (during ingestion)
Proposition Indexing  →  requires LLM (during ingestion)
Knowledge Graph       →  requires LLM (during ingestion)
ColBERT Reranking     →  requires BGE-M3 ColBERT vectors
Parent Expansion      →  requires Hierarchical Indexing
Agentic RAG           →  orchestrates all other query-time techniques
CRAG Quality Gate     →  requires cross-encoder model (separate from LLM)
Self-Verification     →  requires LLM (during query)
HyDE                  →  requires LLM + BGE-M3 (during query)

Disabling a technique in config.yml automatically disables anything that depends on it. The system degrades gracefully — you can run Forge with just BGE-M3 + direct mode and still get good results, then enable techniques incrementally.

Performance Impact

Benchmarked on a 500-page technical manual, RTX 4080 (16GB):

Configuration	Query Latency (p50)	Answer Quality (human eval)
BGE-M3 only (no techniques)	1.2s	6.2/10
+ CRAG + ColBERT	1.8s	7.5/10
+ Contextual + Propositions	1.9s	8.3/10
+ Hierarchical + Graph	2.1s	8.7/10
Full agentic (all 14)	7.4s	9.4/10

The agentic mode is slower but dramatically more accurate, especially for complex questions that require evidence from multiple document sections.

Next Steps

Dive into individual techniques:

Contextual Retrieval — Anthropic’s breakthrough for chunk contextualization
Agentic RAG — The LangGraph agent that ties everything together
CRAG Quality Gate — How irrelevant retrievals get caught before generation
BGE-M3 Vectors — The tri-modal embedding model that powers search

Configuration Contextual Retrieval