RAG TechniquesHierarchical Indexing

Hierarchical Indexing

Forge organizes every document into a 4-level hierarchy: document summaries, section summaries, semantic chunks, and atomic propositions. Different queries need different granularity, and the hierarchical index ensures the right level is always available.

The 4 Levels

┌─────────────────────────────────────────────────┐
│  L0: Document Summary                           │
│  "This is a Q3 2024 earnings report for..."     │
│  Use: "What is this document about?"             │
├─────────────────────────────────────────────────┤
│  L1: Section Summaries                           │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐     │
│  │ Financial  │ │ Strategy  │ │ Risk      │     │
│  │ Highlights │ │ & Outlook │ │ Factors   │     │
│  └───────────┘ └───────────┘ └───────────┘     │
│  Use: "What does Section 3 discuss?"             │
├─────────────────────────────────────────────────┤
│  L2: Semantic Chunks (~512 tokens each)          │
│  ┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐   │
│  │ C1 ││ C2 ││ C3 ││ C4 ││ C5 ││ C6 ││ C7 │   │
│  └────┘└────┘└────┘└────┘└────┘└────┘└────┘   │
│  Use: "What specific method was used?"           │
├─────────────────────────────────────────────────┤
│  L3: Propositions (atomic facts)                 │
│  ┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐   │
│  │P1││P2││P3││P4││P5││P6││P7││P8││P9││P10│   │
│  └──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘   │
│  Use: "What exact value was reported?"           │
└─────────────────────────────────────────────────┘

Level Details

LevelContentTypical SizeCount per 100pg DocUse Case
L0Full document summary300-500 tokens1Document overview, routing
L1Section summaries200-300 tokens10-30Section-level questions
L2Semantic chunks~512 tokens300-500Standard fact retrieval
L3Propositions20-80 tokens1,500-3,000Precise factual lookup

How Each Level Is Created

L0: Document Summary

Generated by the LLM after the full document is parsed:

# forge/ingestion/hierarchy.py
class HierarchyBuilder:
    async def build_l0(self, document: ParsedDocument) -> L0Summary:
        """Generate a document-level summary."""
        # Use first and last sections + any abstract/introduction
        summary_input = self._extract_summary_context(document)
        summary = await self.llm.generate(
            L0_PROMPT.format(text=summary_input),
            max_tokens=500,
        )
        return L0Summary(
            text=summary,
            document_id=document.id,
            level="L0",
        )

L1: Section Summaries

Documents are split at heading boundaries, and each section gets a summary:

    async def build_l1(self, document: ParsedDocument) -> list[L1Section]:
        """Split document into sections and summarize each."""
        sections = self._split_by_headings(document.text)
        l1_sections = []
        for section in sections:
            summary = await self.llm.generate(
                L1_PROMPT.format(
                    document_summary=document.l0_summary,
                    section_title=section.heading,
                    section_text=section.text,
                ),
                max_tokens=300,
            )
            l1_sections.append(L1Section(
                text=summary,
                heading=section.heading,
                full_text=section.text,
                document_id=document.id,
                level="L1",
            ))
        return l1_sections

L2: Semantic Chunks

Sections are further split into semantic chunks using embedding-based boundary detection:

    async def build_l2(self, sections: list[L1Section]) -> list[L2Chunk]:
        """Split sections into semantic chunks."""
        chunks = []
        for section in sections:
            if self.config.method == "semantic":
                section_chunks = await self._semantic_chunk(
                    section.full_text,
                    target_size=self.config.chunk_size,  # 512
                    overlap=self.config.chunk_overlap,    # 50
                )
            elif self.config.method == "fixed":
                section_chunks = self._fixed_chunk(
                    section.full_text,
                    size=self.config.chunk_size,
                    overlap=self.config.chunk_overlap,
                )
            else:  # sentence
                section_chunks = self._sentence_chunk(
                    section.full_text,
                    target_size=self.config.chunk_size,
                )
 
            for chunk_text in section_chunks:
                chunks.append(L2Chunk(
                    text=chunk_text,
                    parent_section_id=section.id,
                    document_id=section.document_id,
                    level="L2",
                ))
        return chunks

Semantic Chunking

The semantic method (default) uses embedding similarity to find natural topic boundaries:

  1. Split text into sentences
  2. Embed each sentence with BGE-M3
  3. Compute cosine similarity between consecutive sentences
  4. Split at points where similarity drops below a threshold (topic shift)
  5. Merge small chunks to reach target size (~512 tokens)

This produces more coherent chunks than fixed-size splitting because each chunk covers a single topic.

L3: Propositions

See Proposition Indexing for the full breakdown. L3 points are extracted from L2 chunks.

Parent Expansion

One of the most powerful features of the hierarchy: retrieve at a lower level, expand to a higher level for context.

How It Works

When the agent retrieves an L2 chunk or L3 proposition that’s highly relevant but might lack context, it can expand to the parent:

async def expand_to_parent(chunk: ScoredChunk) -> ScoredChunk:
    """Retrieve the parent section for additional context."""
    if chunk.level == "L3":
        # L3 → L2 (get the source chunk)
        parent = await qdrant.get(chunk.parent_chunk_id)
    elif chunk.level == "L2":
        # L2 → L1 (get the full section)
        parent = await qdrant.get(chunk.parent_section_id)
    return parent

When It’s Used

  1. CRAG Ambiguous classification — An L2 chunk scores as AMBIGUOUS. The parent L1 section is fetched and re-scored. Often the broader context makes the relevance clear.

  2. Agent decides — The agentic mode can explicitly request parent expansion when it determines a chunk is relevant but insufficient.

  3. Generation context — Even when L3 propositions are used for retrieval matching, the L2 parent chunk (or L1 section) is included in the generation prompt for full context.

RAPTOR Inspiration

The hierarchical structure is inspired by the RAPTOR technique (Sarthi et al., 2024), which uses recursive summarization to build a tree of abstractions:

RAPTOR:       Leaf nodes → Cluster → Summarize → Cluster → Summarize → Root
Forge:        L3 props → L2 chunks → L1 sections → L0 summary

The key difference is that Forge’s hierarchy follows the document’s natural structure (headings, sections) rather than purely clustering embeddings. This makes the hierarchy interpretable and aligned with how humans organize documents.

Configuration

hierarchy:
  levels:
    L0:
      enabled: true
      max_summary_length: 500
    L1:
      enabled: true
      chunking: "heading"       # Split by headings
      max_summary_length: 300
    L2:
      enabled: true
      chunk_size: 512           # Target chunk size in tokens
      chunk_overlap: 50         # Overlap between chunks
      method: "semantic"        # semantic | fixed | sentence
    L3:
      enabled: true             # Controlled by propositions.enabled
Chunk size tuning

512 tokens is the sweet spot for most use cases. Smaller chunks (256) improve precision but lose context. Larger chunks (1024) retain more context but dilute the embedding with multiple topics. If you’re working with highly structured documents (legal contracts, standards), consider 256 tokens + aggressive parent expansion.

Storage Overhead

For a 100-page technical document:

LevelPointsAvg SizeVector Storage
L01500 tokens~4KB
L120300 tokens~80KB
L2400512 tokens~1.6MB
L32,00050 tokens~8MB
Total~2,421~9.7MB

Per document. Qdrant handles this efficiently even on modest hardware.

References

  • Sarthi et al., “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval” (2024)
  • Forge implementation: forge/ingestion/hierarchy.py
  • Chunking methods: forge/ingestion/chunker.py