API Reference

Forge V5 exposes a REST API via FastAPI on port 8000. All endpoints accept and return JSON unless otherwise noted. The API is documented automatically via OpenAPI at http://localhost:8000/docs.

Base URL

http://localhost:8000

Endpoints Overview

Method	Endpoint	Description
`POST`	`/api/query`	Direct (non-streaming) query
`POST`	`/api/query/stream`	Streaming query via SSE
`POST`	`/api/documents/upload`	Upload a document
`GET`	`/api/documents`	List all documents
`DELETE`	`/api/documents/{id}`	Delete a document
`POST`	`/api/ingest`	Trigger manual ingestion
`GET`	`/api/ingest/status`	Check ingestion status
`GET`	`/api/settings`	Get current settings
`PUT`	`/api/settings`	Update settings
`GET`	`/api/models`	List available models
`POST`	`/api/models/load`	Load/switch LLM model
`GET`	`/api/health`	Health check

Query

POST `/api/query`

Run a query and return the complete response (non-streaming).

Request:

{
  "query": "What are the key findings of the study?",
  "mode": "agentic",
  "top_k": 5,
  "filters": {
    "document_ids": ["doc_abc123"],
    "levels": ["L2", "L3"]
  }
}

Field	Type	Required	Default	Description
`query`	string	Yes	—	The question to answer
`mode`	string	No	`"agentic"`	`"agentic"` or `"direct"`
`top_k`	number	No	`5`	Number of source chunks to use
`filters.document_ids`	string[]	No	all	Restrict to specific documents
`filters.levels`	string[]	No	all	Restrict to hierarchy levels

Response (200):

{
  "answer": "The study identifies three key findings: (1) the correlation between...",
  "sources": [
    {
      "chunk_id": "c_1a2b3c",
      "document_id": "doc_abc123",
      "text": "Our analysis reveals a significant correlation...",
      "level": "L2",
      "score": 0.87,
      "page_numbers": [12],
      "heading": "4.1 Results"
    }
  ],
  "confidence": 0.92,
  "metadata": {
    "mode": "agentic",
    "iterations": 4,
    "tools_used": ["semantic_search", "rerank_colbert", "generate_answer"],
    "total_time_ms": 7240,
    "tokens_generated": 312,
    "cached": false
  },
  "verification": {
    "claims_checked": 5,
    "claims_supported": 5,
    "claims_unsupported": 0,
    "confidence": 0.92
  }
}

curl example:

curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key findings?",
    "mode": "agentic"
  }'

POST `/api/query/stream`

Run a query with Server-Sent Events streaming. Same request schema as /api/query.

Request:

{
  "query": "What are the key findings of the study?",
  "mode": "agentic"
}

Response: text/event-stream with SSE events.

See Streaming Protocol for the complete event schema.

curl example:

curl -N http://localhost:8000/api/query/stream \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key findings?",
    "mode": "agentic"
  }'

Use -N for streaming

The -N flag disables curl’s output buffering so SSE events appear in real time.

Documents

POST `/api/documents/upload`

Upload a document for ingestion. Accepts multipart/form-data.

Request:

curl -X POST http://localhost:8000/api/documents/upload \
  -F "file=@research-paper.pdf" \
  -F "metadata={\"tags\": [\"research\", \"2024\"]}"

Field	Type	Required	Description
`file`	file	Yes	PDF, DOCX, or TXT file
`metadata`	JSON string	No	Optional tags and metadata

Supported formats: .pdf, .docx, .doc, .txt

Maximum file size: 100MB (configurable)

Response (201):

{
  "document_id": "doc_a1b2c3d4",
  "filename": "research-paper.pdf",
  "file_size_bytes": 2457600,
  "pages": 42,
  "status": "queued",
  "created_at": "2024-12-15T10:30:00Z"
}

Ingestion begins automatically after upload. Check progress with GET /api/ingest/status.

GET `/api/documents`

List all indexed documents.

Response (200):

{
  "documents": [
    {
      "document_id": "doc_a1b2c3d4",
      "filename": "research-paper.pdf",
      "file_size_bytes": 2457600,
      "pages": 42,
      "status": "indexed",
      "chunks": 197,
      "propositions": 834,
      "entities": 156,
      "created_at": "2024-12-15T10:30:00Z",
      "indexed_at": "2024-12-15T10:35:42Z"
    },
    {
      "document_id": "doc_e5f6g7h8",
      "filename": "policy-manual.docx",
      "file_size_bytes": 1024000,
      "pages": 28,
      "status": "indexing",
      "progress": 0.72,
      "created_at": "2024-12-15T11:00:00Z"
    }
  ],
  "total": 2
}

curl example:

curl http://localhost:8000/api/documents

DELETE `/api/documents/{id}`

Delete a document and all its indexed data (vectors, graph, cache).

Response (200):

{
  "document_id": "doc_a1b2c3d4",
  "deleted": true,
  "points_removed": 2421,
  "graph_edges_removed": 312
}

curl example:

curl -X DELETE http://localhost:8000/api/documents/doc_a1b2c3d4

Ingestion

POST `/api/ingest`

Trigger manual re-ingestion of a document (e.g., after config changes).

Request:

{
  "document_id": "doc_a1b2c3d4",
  "force": true,
  "stages": ["contextual", "propositions", "graph", "embed"]
}

Field	Type	Required	Description
`document_id`	string	Yes	Document to re-ingest
`force`	boolean	No	Re-ingest even if already indexed
`stages`	string[]	No	Specific stages to re-run (default: all)

Response (202):

{
  "document_id": "doc_a1b2c3d4",
  "status": "queued",
  "stages": ["contextual", "propositions", "graph", "embed"]
}

GET `/api/ingest/status`

Check the status of all ingestion jobs.

Response (200):

{
  "active": [
    {
      "document_id": "doc_a1b2c3d4",
      "stage": "contextual_enrichment",
      "progress": 0.65,
      "chunks_processed": 128,
      "chunks_total": 197,
      "started_at": "2024-12-15T10:30:00Z",
      "estimated_remaining_seconds": 120
    }
  ],
  "completed": [
    {
      "document_id": "doc_e5f6g7h8",
      "completed_at": "2024-12-15T10:28:00Z",
      "duration_seconds": 342,
      "points_created": 1856
    }
  ],
  "failed": []
}

curl example:

curl http://localhost:8000/api/ingest/status

Settings

GET `/api/settings`

Get the current configuration.

Response (200):

{
  "query": {
    "default_mode": "agentic",
    "max_iterations": 8,
    "timeout_seconds": 30
  },
  "llm": {
    "model_path": "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    "context_size": 8192,
    "temperature": 0.1,
    "gpu_layers": -1
  },
  "crag": {
    "enabled": true,
    "threshold_correct": 0.7,
    "threshold_ambiguous": 0.4
  },
  "colbert": {
    "enabled": true,
    "top_k": 20,
    "final_k": 5
  },
  "propositions": { "enabled": true },
  "graph": { "enabled": true },
  "verification": { "enabled": true },
  "cache": { "enabled": true, "ttl": 3600 }
}

PUT `/api/settings`

Update configuration at runtime. Only included fields are updated; omitted fields retain their current values.

Request:

{
  "crag": {
    "threshold_correct": 0.8
  },
  "query": {
    "default_mode": "direct"
  }
}

Response (200):

{
  "updated": true,
  "changes": {
    "crag.threshold_correct": { "old": 0.7, "new": 0.8 },
    "query.default_mode": { "old": "agentic", "new": "direct" }
  }
}

LLM settings require restart

Changing llm.model_path, llm.gpu_layers, or llm.context_size requires an LLM reload. Use POST /api/models/load to apply LLM changes without restarting the server.

curl example:

curl -X PUT http://localhost:8000/api/settings \
  -H "Content-Type: application/json" \
  -d '{"crag": {"threshold_correct": 0.8}}'

Models

GET `/api/models`

List available models and the currently loaded model.

Response (200):

{
  "current": {
    "name": "mistral-7b-instruct-v0.2.Q4_K_M",
    "path": "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    "parameters": "7B",
    "quantization": "Q4_K_M",
    "vram_usage_gb": 4.4,
    "context_size": 8192
  },
  "available": [
    {
      "name": "mistral-7b-instruct-v0.2.Q4_K_M",
      "path": "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
      "size_gb": 4.4,
      "quantization": "Q4_K_M"
    },
    {
      "name": "llama-3.1-8b-instruct.Q4_K_M",
      "path": "models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
      "size_gb": 4.9,
      "quantization": "Q4_K_M"
    }
  ]
}

POST `/api/models/load`

Load or switch to a different LLM model. This unloads the current model and loads the new one.

Request:

{
  "model_path": "models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
  "gpu_layers": -1,
  "context_size": 8192
}

Response (200):

{
  "loaded": true,
  "model": "Meta-Llama-3.1-8B-Instruct-Q4_K_M",
  "vram_usage_gb": 4.9,
  "load_time_seconds": 3.2
}

Model loading takes a few seconds

The server is unavailable for queries during model loading (typically 2-5 seconds). The endpoint returns after loading is complete.

curl example:

curl -X POST http://localhost:8000/api/models/load \
  -H "Content-Type: application/json" \
  -d '{"model_path": "models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"}'

Health

GET `/api/health`

Health check endpoint. Returns the status of all services.

Response (200):

{
  "status": "healthy",
  "version": "5.0.0",
  "uptime_seconds": 3456,
  "services": {
    "qdrant": "connected",
    "redis": "connected",
    "llm": "loaded",
    "bge_m3": "loaded"
  },
  "gpu": {
    "available": true,
    "name": "NVIDIA GeForce RTX 4080",
    "vram_total_gb": 16.0,
    "vram_used_gb": 11.2,
    "cuda_version": "12.2"
  },
  "stats": {
    "documents_indexed": 5,
    "total_points": 12450,
    "queries_served": 142,
    "cache_hit_rate": 0.23
  }
}

curl example:

curl http://localhost:8000/api/health

Error Responses

All endpoints return errors in a consistent format:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Query text is required",
    "details": { "field": "query", "constraint": "non-empty string" }
  }
}

Error Codes

Code	HTTP Status	Description
`VALIDATION_ERROR`	400	Invalid request parameters
`DOCUMENT_NOT_FOUND`	404	Document ID doesn’t exist
`UNSUPPORTED_FORMAT`	400	File type not supported
`FILE_TOO_LARGE`	413	File exceeds size limit
`LLM_ERROR`	500	LLM inference failure
`VECTOR_DB_ERROR`	500	Qdrant connection/query failure
`INGESTION_ERROR`	500	Pipeline failure during ingestion
`TIMEOUT`	504	Query exceeded timeout
`NO_DOCUMENTS`	400	No documents indexed yet
`MODEL_NOT_FOUND`	404	Specified model file doesn’t exist

Rate Limits

There are no rate limits by default. Forge is designed for single-user desktop use. If deploying as a shared service, configure rate limiting via a reverse proxy (nginx, Caddy).

OpenAPI Documentation

FastAPI automatically generates interactive API documentation:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc
OpenAPI JSON: http://localhost:8000/openapi.json

Streaming Protocol About

API Reference

Base URL

Endpoints Overview

Query

POST /api/query

POST /api/query/stream

Documents

POST /api/documents/upload

GET /api/documents

DELETE /api/documents/{id}

Ingestion

POST /api/ingest

GET /api/ingest/status

Settings

GET /api/settings

PUT /api/settings

Models

GET /api/models

POST /api/models/load

Health

GET /api/health

Error Responses

Error Codes

Rate Limits

OpenAPI Documentation

POST `/api/query`

POST `/api/query/stream`

POST `/api/documents/upload`

GET `/api/documents`

DELETE `/api/documents/{id}`

POST `/api/ingest`

GET `/api/ingest/status`

GET `/api/settings`

PUT `/api/settings`

GET `/api/models`

POST `/api/models/load`

GET `/api/health`