Unstructured
Unstructured is a document parsing and pre-processing platform that provides a REST API for ingesting PDFs, HTML, DOCX, images, and more than 50 other file formats, transforming them into clean structured JSON chunks ready for RAG pipelines and LLM applications. The platform offers partitioning, enrichment, chunking, and embedding capabilities via both a SaaS serverless API and self-hosted deployments. Billing is calculated on a per-page basis, with a free tier of 15,000 pages, pay-as-you-go at $0.03 per page, and custom enterprise pricing. Unstructured also ships Python and JavaScript SDKs, an MCP server for AI agent workflows, and 40+ connectors for source and destination data systems including S3, Databricks, and vector databases.
APIs
Unstructured Platform API
The Unstructured Platform API provides programmatic access to workflow operations including source connectors, destination connectors, workflows, and jobs. It enables headless, ...
Unstructured Partition Endpoint
The Unstructured Partition Endpoint is the legacy serverless API for processing individual files on demand. It supports PDFs, images, DOCX, HTML, and dozens of other formats, re...