Hierarchical Indexing
Forge organizes every document into a 4-level hierarchy: document summaries, section summaries, semantic chunks, and atomic propositions. Different queries need different granularity, and the hierarchical index ensures the right level is always available.
The 4 Levels
┌─────────────────────────────────────────────────┐
│ L0: Document Summary │
│ "This is a Q3 2024 earnings report for..." │
│ Use: "What is this document about?" │
├─────────────────────────────────────────────────┤
│ L1: Section Summaries │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Financial │ │ Strategy │ │ Risk │ │
│ │ Highlights │ │ & Outlook │ │ Factors │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ Use: "What does Section 3 discuss?" │
├─────────────────────────────────────────────────┤
│ L2: Semantic Chunks (~512 tokens each) │
│ ┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐ │
│ │ C1 ││ C2 ││ C3 ││ C4 ││ C5 ││ C6 ││ C7 │ │
│ └────┘└────┘└────┘└────┘└────┘└────┘└────┘ │
│ Use: "What specific method was used?" │
├─────────────────────────────────────────────────┤
│ L3: Propositions (atomic facts) │
│ ┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐ │
│ │P1││P2││P3││P4││P5││P6││P7││P8││P9││P10│ │
│ └──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘ │
│ Use: "What exact value was reported?" │
└─────────────────────────────────────────────────┘Level Details
| Level | Content | Typical Size | Count per 100pg Doc | Use Case |
|---|---|---|---|---|
| L0 | Full document summary | 300-500 tokens | 1 | Document overview, routing |
| L1 | Section summaries | 200-300 tokens | 10-30 | Section-level questions |
| L2 | Semantic chunks | ~512 tokens | 300-500 | Standard fact retrieval |
| L3 | Propositions | 20-80 tokens | 1,500-3,000 | Precise factual lookup |
How Each Level Is Created
L0: Document Summary
Generated by the LLM after the full document is parsed:
# forge/ingestion/hierarchy.py
class HierarchyBuilder:
async def build_l0(self, document: ParsedDocument) -> L0Summary:
"""Generate a document-level summary."""
# Use first and last sections + any abstract/introduction
summary_input = self._extract_summary_context(document)
summary = await self.llm.generate(
L0_PROMPT.format(text=summary_input),
max_tokens=500,
)
return L0Summary(
text=summary,
document_id=document.id,
level="L0",
)L1: Section Summaries
Documents are split at heading boundaries, and each section gets a summary:
async def build_l1(self, document: ParsedDocument) -> list[L1Section]:
"""Split document into sections and summarize each."""
sections = self._split_by_headings(document.text)
l1_sections = []
for section in sections:
summary = await self.llm.generate(
L1_PROMPT.format(
document_summary=document.l0_summary,
section_title=section.heading,
section_text=section.text,
),
max_tokens=300,
)
l1_sections.append(L1Section(
text=summary,
heading=section.heading,
full_text=section.text,
document_id=document.id,
level="L1",
))
return l1_sectionsL2: Semantic Chunks
Sections are further split into semantic chunks using embedding-based boundary detection:
async def build_l2(self, sections: list[L1Section]) -> list[L2Chunk]:
"""Split sections into semantic chunks."""
chunks = []
for section in sections:
if self.config.method == "semantic":
section_chunks = await self._semantic_chunk(
section.full_text,
target_size=self.config.chunk_size, # 512
overlap=self.config.chunk_overlap, # 50
)
elif self.config.method == "fixed":
section_chunks = self._fixed_chunk(
section.full_text,
size=self.config.chunk_size,
overlap=self.config.chunk_overlap,
)
else: # sentence
section_chunks = self._sentence_chunk(
section.full_text,
target_size=self.config.chunk_size,
)
for chunk_text in section_chunks:
chunks.append(L2Chunk(
text=chunk_text,
parent_section_id=section.id,
document_id=section.document_id,
level="L2",
))
return chunksSemantic Chunking
The semantic method (default) uses embedding similarity to find natural topic boundaries:
- Split text into sentences
- Embed each sentence with BGE-M3
- Compute cosine similarity between consecutive sentences
- Split at points where similarity drops below a threshold (topic shift)
- Merge small chunks to reach target size (~512 tokens)
This produces more coherent chunks than fixed-size splitting because each chunk covers a single topic.
L3: Propositions
See Proposition Indexing for the full breakdown. L3 points are extracted from L2 chunks.
Parent Expansion
One of the most powerful features of the hierarchy: retrieve at a lower level, expand to a higher level for context.
How It Works
When the agent retrieves an L2 chunk or L3 proposition that’s highly relevant but might lack context, it can expand to the parent:
async def expand_to_parent(chunk: ScoredChunk) -> ScoredChunk:
"""Retrieve the parent section for additional context."""
if chunk.level == "L3":
# L3 → L2 (get the source chunk)
parent = await qdrant.get(chunk.parent_chunk_id)
elif chunk.level == "L2":
# L2 → L1 (get the full section)
parent = await qdrant.get(chunk.parent_section_id)
return parentWhen It’s Used
-
CRAG Ambiguous classification — An L2 chunk scores as AMBIGUOUS. The parent L1 section is fetched and re-scored. Often the broader context makes the relevance clear.
-
Agent decides — The agentic mode can explicitly request parent expansion when it determines a chunk is relevant but insufficient.
-
Generation context — Even when L3 propositions are used for retrieval matching, the L2 parent chunk (or L1 section) is included in the generation prompt for full context.
RAPTOR Inspiration
The hierarchical structure is inspired by the RAPTOR technique (Sarthi et al., 2024), which uses recursive summarization to build a tree of abstractions:
RAPTOR: Leaf nodes → Cluster → Summarize → Cluster → Summarize → Root
Forge: L3 props → L2 chunks → L1 sections → L0 summaryThe key difference is that Forge’s hierarchy follows the document’s natural structure (headings, sections) rather than purely clustering embeddings. This makes the hierarchy interpretable and aligned with how humans organize documents.
Configuration
hierarchy:
levels:
L0:
enabled: true
max_summary_length: 500
L1:
enabled: true
chunking: "heading" # Split by headings
max_summary_length: 300
L2:
enabled: true
chunk_size: 512 # Target chunk size in tokens
chunk_overlap: 50 # Overlap between chunks
method: "semantic" # semantic | fixed | sentence
L3:
enabled: true # Controlled by propositions.enabled512 tokens is the sweet spot for most use cases. Smaller chunks (256) improve precision but lose context. Larger chunks (1024) retain more context but dilute the embedding with multiple topics. If you’re working with highly structured documents (legal contracts, standards), consider 256 tokens + aggressive parent expansion.
Storage Overhead
For a 100-page technical document:
| Level | Points | Avg Size | Vector Storage |
|---|---|---|---|
| L0 | 1 | 500 tokens | ~4KB |
| L1 | 20 | 300 tokens | ~80KB |
| L2 | 400 | 512 tokens | ~1.6MB |
| L3 | 2,000 | 50 tokens | ~8MB |
| Total | ~2,421 | ~9.7MB |
Per document. Qdrant handles this efficiently even on modest hardware.
References
- Sarthi et al., “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval” (2024)
- Forge implementation:
forge/ingestion/hierarchy.py - Chunking methods:
forge/ingestion/chunker.py