Open-source · Visual RAG inspection

Docling Studio

Inspect, debug and repair Docling-based RAG pipelines visually.

MIT licensed Dockerised Built on Docling
Why it exists

The JSON that Docling emits is not enough.

Docling turns PDFs into rich structured data, but structured data alone does not tell you whether the extraction is good enough for production. Debugging a RAG pipeline means chasing silent failures across OCR, chunking, embedding and retrieval — with no single place to see what went wrong.

Docling Studio is the missing inspection layer: a visual studio where each stage of the pipeline can be opened, verified and repaired before it ships.

Use cases

Three ways to open the black box.

Active

OCR Debug

upload convert bbox validate

Upload a document, run a conversion, inspect every bounding box and validate the extraction against the source page.

Active

RAG Pipeline Inspection

document chunk embed store

Walk a document through chunking, embedding and vector-store retrieval. Visualise every transformation and spot regressions before they ship.

Experimental

Chunkless RAG Debugger

document retrieve

Visual inspection for chunkless retrieval pipelines. Early build available — expect rough edges and breaking changes.

Deployment

Two modes. One image.

Run Docling Studio locally for fast debugging, or plug it into an external Docling Serve instance to inspect your production pipeline.

Local embedded

Docling embedded in the container. Standalone, zero dependencies, ideal for quick inspection and offline work.

# Local — Docling embedded
docker run -p 8000:8000 \
  ghcr.io/scub-france/docling-studio:latest

Remote serve-connected

Connect Docling Studio to an external Docling Serve instance. The recommended mode to inspect staging or production pipelines.

# Remote — connected to Docling Serve
docker run -p 8000:8000 \
  -e DOCLING_MODE=remote \
  -e DOCLING_SERVE_URL=https://your-docling-serve \
  ghcr.io/scub-france/docling-studio:latest
Vector stores

Bring your own store.

Docling Studio supports multiple vector stores through a Protocol-based adapter system — swap the backend without touching the pipeline.

Available today

OpenSearch

First-class adapter, used in production deployments.

Planned

Neo4j

Graph-native retrieval adapter on the roadmap.

Extensible

Your own

Add a new backend by implementing the VectorStore Protocol.

Architecture

Hexagonal, feature-flagged, deployable.

Ports and adapters isolate the pipeline from the outside world. Feature flags switch deployment modes without touching the code.

What you can do with it

Validate before you ship.

  • Validate OCR extraction quality on complex technical PDFs — tables, multi-column layouts, scanned pages.
  • Debug chunking strategies before committing to production — compare splitters side by side, measure their effect on retrieval.
  • Inspect why a RAG retrieval returned the wrong passage — trace the query through embedding, similarity and ranking.
Ecosystem

Where Docling Studio fits.

Built on Docling

Depends on Docling, a project in the Linux Foundation AI & Data ecosystem.

Compatible with Docling Serve

Connects to any Docling Serve instance as a remote backend, locally or in production.

Maintained by SCUB

Open source under the MIT license, developed and maintained by SCUB.