API ReferenceAPI Reference

API Reference

Forge V5 exposes a REST API via FastAPI on port 8000. All endpoints accept and return JSON unless otherwise noted. The API is documented automatically via OpenAPI at http://localhost:8000/docs.

Base URL

http://localhost:8000

Endpoints Overview

MethodEndpointDescription
POST/api/queryDirect (non-streaming) query
POST/api/query/streamStreaming query via SSE
POST/api/documents/uploadUpload a document
GET/api/documentsList all documents
DELETE/api/documents/{id}Delete a document
POST/api/ingestTrigger manual ingestion
GET/api/ingest/statusCheck ingestion status
GET/api/settingsGet current settings
PUT/api/settingsUpdate settings
GET/api/modelsList available models
POST/api/models/loadLoad/switch LLM model
GET/api/healthHealth check

Query

POST /api/query

Run a query and return the complete response (non-streaming).

Request:

{
  "query": "What are the key findings of the study?",
  "mode": "agentic",
  "top_k": 5,
  "filters": {
    "document_ids": ["doc_abc123"],
    "levels": ["L2", "L3"]
  }
}
FieldTypeRequiredDefaultDescription
querystringYesThe question to answer
modestringNo"agentic""agentic" or "direct"
top_knumberNo5Number of source chunks to use
filters.document_idsstring[]NoallRestrict to specific documents
filters.levelsstring[]NoallRestrict to hierarchy levels

Response (200):

{
  "answer": "The study identifies three key findings: (1) the correlation between...",
  "sources": [
    {
      "chunk_id": "c_1a2b3c",
      "document_id": "doc_abc123",
      "text": "Our analysis reveals a significant correlation...",
      "level": "L2",
      "score": 0.87,
      "page_numbers": [12],
      "heading": "4.1 Results"
    }
  ],
  "confidence": 0.92,
  "metadata": {
    "mode": "agentic",
    "iterations": 4,
    "tools_used": ["semantic_search", "rerank_colbert", "generate_answer"],
    "total_time_ms": 7240,
    "tokens_generated": 312,
    "cached": false
  },
  "verification": {
    "claims_checked": 5,
    "claims_supported": 5,
    "claims_unsupported": 0,
    "confidence": 0.92
  }
}

curl example:

curl -X POST http://localhost:8000/api/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key findings?",
    "mode": "agentic"
  }'

POST /api/query/stream

Run a query with Server-Sent Events streaming. Same request schema as /api/query.

Request:

{
  "query": "What are the key findings of the study?",
  "mode": "agentic"
}

Response: text/event-stream with SSE events.

See Streaming Protocol for the complete event schema.

curl example:

curl -N http://localhost:8000/api/query/stream \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the key findings?",
    "mode": "agentic"
  }'
Use -N for streaming

The -N flag disables curl’s output buffering so SSE events appear in real time.


Documents

POST /api/documents/upload

Upload a document for ingestion. Accepts multipart/form-data.

Request:

curl -X POST http://localhost:8000/api/documents/upload \
  -F "file=@research-paper.pdf" \
  -F "metadata={\"tags\": [\"research\", \"2024\"]}"
FieldTypeRequiredDescription
filefileYesPDF, DOCX, or TXT file
metadataJSON stringNoOptional tags and metadata

Supported formats: .pdf, .docx, .doc, .txt

Maximum file size: 100MB (configurable)

Response (201):

{
  "document_id": "doc_a1b2c3d4",
  "filename": "research-paper.pdf",
  "file_size_bytes": 2457600,
  "pages": 42,
  "status": "queued",
  "created_at": "2024-12-15T10:30:00Z"
}

Ingestion begins automatically after upload. Check progress with GET /api/ingest/status.


GET /api/documents

List all indexed documents.

Response (200):

{
  "documents": [
    {
      "document_id": "doc_a1b2c3d4",
      "filename": "research-paper.pdf",
      "file_size_bytes": 2457600,
      "pages": 42,
      "status": "indexed",
      "chunks": 197,
      "propositions": 834,
      "entities": 156,
      "created_at": "2024-12-15T10:30:00Z",
      "indexed_at": "2024-12-15T10:35:42Z"
    },
    {
      "document_id": "doc_e5f6g7h8",
      "filename": "policy-manual.docx",
      "file_size_bytes": 1024000,
      "pages": 28,
      "status": "indexing",
      "progress": 0.72,
      "created_at": "2024-12-15T11:00:00Z"
    }
  ],
  "total": 2
}

curl example:

curl http://localhost:8000/api/documents

DELETE /api/documents/{id}

Delete a document and all its indexed data (vectors, graph, cache).

Response (200):

{
  "document_id": "doc_a1b2c3d4",
  "deleted": true,
  "points_removed": 2421,
  "graph_edges_removed": 312
}

curl example:

curl -X DELETE http://localhost:8000/api/documents/doc_a1b2c3d4

Ingestion

POST /api/ingest

Trigger manual re-ingestion of a document (e.g., after config changes).

Request:

{
  "document_id": "doc_a1b2c3d4",
  "force": true,
  "stages": ["contextual", "propositions", "graph", "embed"]
}
FieldTypeRequiredDescription
document_idstringYesDocument to re-ingest
forcebooleanNoRe-ingest even if already indexed
stagesstring[]NoSpecific stages to re-run (default: all)

Response (202):

{
  "document_id": "doc_a1b2c3d4",
  "status": "queued",
  "stages": ["contextual", "propositions", "graph", "embed"]
}

GET /api/ingest/status

Check the status of all ingestion jobs.

Response (200):

{
  "active": [
    {
      "document_id": "doc_a1b2c3d4",
      "stage": "contextual_enrichment",
      "progress": 0.65,
      "chunks_processed": 128,
      "chunks_total": 197,
      "started_at": "2024-12-15T10:30:00Z",
      "estimated_remaining_seconds": 120
    }
  ],
  "completed": [
    {
      "document_id": "doc_e5f6g7h8",
      "completed_at": "2024-12-15T10:28:00Z",
      "duration_seconds": 342,
      "points_created": 1856
    }
  ],
  "failed": []
}

curl example:

curl http://localhost:8000/api/ingest/status

Settings

GET /api/settings

Get the current configuration.

Response (200):

{
  "query": {
    "default_mode": "agentic",
    "max_iterations": 8,
    "timeout_seconds": 30
  },
  "llm": {
    "model_path": "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    "context_size": 8192,
    "temperature": 0.1,
    "gpu_layers": -1
  },
  "crag": {
    "enabled": true,
    "threshold_correct": 0.7,
    "threshold_ambiguous": 0.4
  },
  "colbert": {
    "enabled": true,
    "top_k": 20,
    "final_k": 5
  },
  "propositions": { "enabled": true },
  "graph": { "enabled": true },
  "verification": { "enabled": true },
  "cache": { "enabled": true, "ttl": 3600 }
}

PUT /api/settings

Update configuration at runtime. Only included fields are updated; omitted fields retain their current values.

Request:

{
  "crag": {
    "threshold_correct": 0.8
  },
  "query": {
    "default_mode": "direct"
  }
}

Response (200):

{
  "updated": true,
  "changes": {
    "crag.threshold_correct": { "old": 0.7, "new": 0.8 },
    "query.default_mode": { "old": "agentic", "new": "direct" }
  }
}
LLM settings require restart

Changing llm.model_path, llm.gpu_layers, or llm.context_size requires an LLM reload. Use POST /api/models/load to apply LLM changes without restarting the server.

curl example:

curl -X PUT http://localhost:8000/api/settings \
  -H "Content-Type: application/json" \
  -d '{"crag": {"threshold_correct": 0.8}}'

Models

GET /api/models

List available models and the currently loaded model.

Response (200):

{
  "current": {
    "name": "mistral-7b-instruct-v0.2.Q4_K_M",
    "path": "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    "parameters": "7B",
    "quantization": "Q4_K_M",
    "vram_usage_gb": 4.4,
    "context_size": 8192
  },
  "available": [
    {
      "name": "mistral-7b-instruct-v0.2.Q4_K_M",
      "path": "models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
      "size_gb": 4.4,
      "quantization": "Q4_K_M"
    },
    {
      "name": "llama-3.1-8b-instruct.Q4_K_M",
      "path": "models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
      "size_gb": 4.9,
      "quantization": "Q4_K_M"
    }
  ]
}

POST /api/models/load

Load or switch to a different LLM model. This unloads the current model and loads the new one.

Request:

{
  "model_path": "models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf",
  "gpu_layers": -1,
  "context_size": 8192
}

Response (200):

{
  "loaded": true,
  "model": "Meta-Llama-3.1-8B-Instruct-Q4_K_M",
  "vram_usage_gb": 4.9,
  "load_time_seconds": 3.2
}
Model loading takes a few seconds

The server is unavailable for queries during model loading (typically 2-5 seconds). The endpoint returns after loading is complete.

curl example:

curl -X POST http://localhost:8000/api/models/load \
  -H "Content-Type: application/json" \
  -d '{"model_path": "models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"}'

Health

GET /api/health

Health check endpoint. Returns the status of all services.

Response (200):

{
  "status": "healthy",
  "version": "5.0.0",
  "uptime_seconds": 3456,
  "services": {
    "qdrant": "connected",
    "redis": "connected",
    "llm": "loaded",
    "bge_m3": "loaded"
  },
  "gpu": {
    "available": true,
    "name": "NVIDIA GeForce RTX 4080",
    "vram_total_gb": 16.0,
    "vram_used_gb": 11.2,
    "cuda_version": "12.2"
  },
  "stats": {
    "documents_indexed": 5,
    "total_points": 12450,
    "queries_served": 142,
    "cache_hit_rate": 0.23
  }
}

curl example:

curl http://localhost:8000/api/health

Error Responses

All endpoints return errors in a consistent format:

{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Query text is required",
    "details": { "field": "query", "constraint": "non-empty string" }
  }
}

Error Codes

CodeHTTP StatusDescription
VALIDATION_ERROR400Invalid request parameters
DOCUMENT_NOT_FOUND404Document ID doesn’t exist
UNSUPPORTED_FORMAT400File type not supported
FILE_TOO_LARGE413File exceeds size limit
LLM_ERROR500LLM inference failure
VECTOR_DB_ERROR500Qdrant connection/query failure
INGESTION_ERROR500Pipeline failure during ingestion
TIMEOUT504Query exceeded timeout
NO_DOCUMENTS400No documents indexed yet
MODEL_NOT_FOUND404Specified model file doesn’t exist

Rate Limits

There are no rate limits by default. Forge is designed for single-user desktop use. If deploying as a shared service, configure rate limiting via a reverse proxy (nginx, Caddy).

OpenAPI Documentation

FastAPI automatically generates interactive API documentation:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc
  • OpenAPI JSON: http://localhost:8000/openapi.json