V5.0 — Agentic RAG Engine

Forge

The most advanced RAG system you can run on a single GPU.

14
RAG Techniques
<10s
Agentic Queries
<2s
Cached Response
10/10
Tests Passing
Live Demo

Watch the agent think.

A real agentic query: tool selection, CRAG evaluation, ColBERT reranking, token streaming, and claim verification — all visible in real-time.

forge query --mode agentic --stream
"What is the military leave accrual rate for active duty members?"

Click Run Demo to see an agentic query in action

14 Techniques

Every failure mode, handled.

Each technique solves a specific retrieval failure. Together they form the most comprehensive RAG pipeline assembled for a single-GPU system.

Retrieval
01

Agentic RAG

LangGraph ReAct loop — the LLM autonomously decides which retrieval tools to invoke, iterating until it has sufficient evidence to answer.

Indexing
02

Contextual Retrieval

Anthropic's technique: LLM prepends chunk-specific context before embedding. 49% fewer retrieval failures, 67% with reranking.

Retrieval
03

CRAG Quality Gate

Cross-encoder evaluates every retrieved document as CORRECT, AMBIGUOUS, or INCORRECT before it reaches generation. Re-retrieves on failure.

Search
04

ColBERT Late Interaction

Token-level MaxSim scoring catches specific facts that dense vectors miss.

Indexing
05

Proposition Indexing

Atomic factual claims indexed as standalone searchable units for precision retrieval.

Indexing
06

Hierarchical 4-Level

L0 doc summaries → L1 sections → L2 semantic chunks → L3 propositions.

Search
07

BGE-M3 Tri-Modal

One model, three vector types: dense + sparse + ColBERT multi-vector.

Search
08

Knowledge Graph

Entity relationships for structural queries via graph traversal.

Quality
09

Self-Verification

Post-generation claim-by-claim audit against source documents.

Quality
10

6-Signal Confidence

Multi-dimensional reliability scoring across retrieval, CRAG, and verification.

Quality
11

Query Decomposition

Complex questions split into targeted sub-queries for parallel retrieval.

Quality
12

HyDE

Generate a hypothetical ideal answer, embed it, search for real matches.

Retrieval
13

Multi-Hop Reasoning

Follow cross-references iteratively across document boundaries.

Quality
14

Parent Expansion

Match a chunk, return the parent section for full context fidelity.

Comparison

Standard RAG vs Forge

Standard RAG
Forge V5
Retrieval
Single-shot retrieve-then-read
Agentic multi-step with tool use
Chunking
Fixed-size character splits
Hierarchical 4-level (doc/section/chunk/proposition)
Embeddings
Single dense vector
BGE-M3 tri-modal (dense + sparse + ColBERT)
Quality Gate
None — trust whatever retrieves
CRAG cross-encoder evaluation with re-retrieval
Verification
None — hope for the best
Claim-by-claim source verification
Context
Raw chunks, no surrounding info
Contextual enrichment (49% fewer failures)
Precision
Chunk-level granularity
Proposition-level atomic claims
Relationships
Vector similarity only
Knowledge graph with entity traversal

Ready to forge?

14 techniques. One GPU. No open-source system combines all of these.

Built by hollowed_eyes · Forge V5 is a portfolio project demonstrating world-class RAG engineering.