Hierarchical Indexing

Forge organizes every document into a 4-level hierarchy: document summaries, section summaries, semantic chunks, and atomic propositions. Different queries need different granularity, and the hierarchical index ensures the right level is always available.

The 4 Levels

┌─────────────────────────────────────────────────┐
│  L0: Document Summary                           │
│  "This is a Q3 2024 earnings report for..."     │
│  Use: "What is this document about?"             │
├─────────────────────────────────────────────────┤
│  L1: Section Summaries                           │
│  ┌───────────┐ ┌───────────┐ ┌───────────┐     │
│  │ Financial  │ │ Strategy  │ │ Risk      │     │
│  │ Highlights │ │ & Outlook │ │ Factors   │     │
│  └───────────┘ └───────────┘ └───────────┘     │
│  Use: "What does Section 3 discuss?"             │
├─────────────────────────────────────────────────┤
│  L2: Semantic Chunks (~512 tokens each)          │
│  ┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐┌────┐   │
│  │ C1 ││ C2 ││ C3 ││ C4 ││ C5 ││ C6 ││ C7 │   │
│  └────┘└────┘└────┘└────┘└────┘└────┘└────┘   │
│  Use: "What specific method was used?"           │
├─────────────────────────────────────────────────┤
│  L3: Propositions (atomic facts)                 │
│  ┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐┌──┐   │
│  │P1││P2││P3││P4││P5││P6││P7││P8││P9││P10│   │
│  └──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘└──┘   │
│  Use: "What exact value was reported?"           │
└─────────────────────────────────────────────────┘

Level Details

Level	Content	Typical Size	Count per 100pg Doc	Use Case
L0	Full document summary	300-500 tokens	1	Document overview, routing
L1	Section summaries	200-300 tokens	10-30	Section-level questions
L2	Semantic chunks	~512 tokens	300-500	Standard fact retrieval
L3	Propositions	20-80 tokens	1,500-3,000	Precise factual lookup

How Each Level Is Created

L0: Document Summary

Generated by the LLM after the full document is parsed:

# forge/ingestion/hierarchy.py
class HierarchyBuilder:
    async def build_l0(self, document: ParsedDocument) -> L0Summary:
        """Generate a document-level summary."""
        # Use first and last sections + any abstract/introduction
        summary_input = self._extract_summary_context(document)
        summary = await self.llm.generate(
            L0_PROMPT.format(text=summary_input),
            max_tokens=500,
        )
        return L0Summary(
            text=summary,
            document_id=document.id,
            level="L0",
        )

L1: Section Summaries

Documents are split at heading boundaries, and each section gets a summary:

    async def build_l1(self, document: ParsedDocument) -> list[L1Section]:
        """Split document into sections and summarize each."""
        sections = self._split_by_headings(document.text)
        l1_sections = []
        for section in sections:
            summary = await self.llm.generate(
                L1_PROMPT.format(
                    document_summary=document.l0_summary,
                    section_title=section.heading,
                    section_text=section.text,
                ),
                max_tokens=300,
            )
            l1_sections.append(L1Section(
                text=summary,
                heading=section.heading,
                full_text=section.text,
                document_id=document.id,
                level="L1",
            ))
        return l1_sections

L2: Semantic Chunks

Sections are further split into semantic chunks using embedding-based boundary detection:

    async def build_l2(self, sections: list[L1Section]) -> list[L2Chunk]:
        """Split sections into semantic chunks."""
        chunks = []
        for section in sections:
            if self.config.method == "semantic":
                section_chunks = await self._semantic_chunk(
                    section.full_text,
                    target_size=self.config.chunk_size,  # 512
                    overlap=self.config.chunk_overlap,    # 50
                )
            elif self.config.method == "fixed":
                section_chunks = self._fixed_chunk(
                    section.full_text,
                    size=self.config.chunk_size,
                    overlap=self.config.chunk_overlap,
                )
            else:  # sentence
                section_chunks = self._sentence_chunk(
                    section.full_text,
                    target_size=self.config.chunk_size,
                )
 
            for chunk_text in section_chunks:
                chunks.append(L2Chunk(
                    text=chunk_text,
                    parent_section_id=section.id,
                    document_id=section.document_id,
                    level="L2",
                ))
        return chunks

Semantic Chunking

The semantic method (default) uses embedding similarity to find natural topic boundaries:

Split text into sentences
Embed each sentence with BGE-M3
Compute cosine similarity between consecutive sentences
Split at points where similarity drops below a threshold (topic shift)
Merge small chunks to reach target size (~512 tokens)

This produces more coherent chunks than fixed-size splitting because each chunk covers a single topic.

L3: Propositions

See Proposition Indexing for the full breakdown. L3 points are extracted from L2 chunks.

Parent Expansion

One of the most powerful features of the hierarchy: retrieve at a lower level, expand to a higher level for context.

How It Works

When the agent retrieves an L2 chunk or L3 proposition that’s highly relevant but might lack context, it can expand to the parent:

async def expand_to_parent(chunk: ScoredChunk) -> ScoredChunk:
    """Retrieve the parent section for additional context."""
    if chunk.level == "L3":
        # L3 → L2 (get the source chunk)
        parent = await qdrant.get(chunk.parent_chunk_id)
    elif chunk.level == "L2":
        # L2 → L1 (get the full section)
        parent = await qdrant.get(chunk.parent_section_id)
    return parent

When It’s Used

CRAG Ambiguous classification — An L2 chunk scores as AMBIGUOUS. The parent L1 section is fetched and re-scored. Often the broader context makes the relevance clear.
Agent decides — The agentic mode can explicitly request parent expansion when it determines a chunk is relevant but insufficient.
Generation context — Even when L3 propositions are used for retrieval matching, the L2 parent chunk (or L1 section) is included in the generation prompt for full context.

RAPTOR Inspiration

The hierarchical structure is inspired by the RAPTOR technique (Sarthi et al., 2024), which uses recursive summarization to build a tree of abstractions:

RAPTOR:       Leaf nodes → Cluster → Summarize → Cluster → Summarize → Root
Forge:        L3 props → L2 chunks → L1 sections → L0 summary

The key difference is that Forge’s hierarchy follows the document’s natural structure (headings, sections) rather than purely clustering embeddings. This makes the hierarchy interpretable and aligned with how humans organize documents.

Configuration

hierarchy:
  levels:
    L0:
      enabled: true
      max_summary_length: 500
    L1:
      enabled: true
      chunking: "heading"       # Split by headings
      max_summary_length: 300
    L2:
      enabled: true
      chunk_size: 512           # Target chunk size in tokens
      chunk_overlap: 50         # Overlap between chunks
      method: "semantic"        # semantic | fixed | sentence
    L3:
      enabled: true             # Controlled by propositions.enabled

Chunk size tuning

512 tokens is the sweet spot for most use cases. Smaller chunks (256) improve precision but lose context. Larger chunks (1024) retain more context but dilute the embedding with multiple topics. If you’re working with highly structured documents (legal contracts, standards), consider 256 tokens + aggressive parent expansion.

Storage Overhead

For a 100-page technical document:

Level	Points	Avg Size	Vector Storage
L0	1	500 tokens	~4KB
L1	20	300 tokens	~80KB
L2	400	512 tokens	~1.6MB
L3	2,000	50 tokens	~8MB
Total	~2,421		~9.7MB

Per document. Qdrant handles this efficiently even on modest hardware.

References

Sarthi et al., “RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval” (2024)
Forge implementation: forge/ingestion/hierarchy.py
Chunking methods: forge/ingestion/chunker.py

Proposition Indexing Knowledge Graph