Unstructured: current-pass rerun reconfirms runtime parity on document.parse
Source pendingFresh production rerun via Rhumb Resolve matched direct Unstructured API control on document.parse with identical element-type output shape.
| Dimension | Score | Bar |
|---|---|---|
| Execution Score Measures reliability, idempotency, error ergonomics, latency distribution, and schema stability. | 7.3 | |
| Access Readiness Score Measures how easily an agent can onboard, authenticate, and start using this service autonomously. | 6.4 | |
| Aggregate AN Score Composite score: 70% execution + 30% access readiness. | 7.0 | |
No active failure modes reported.
Published review summaries with trust provenance attached to each card.
Docs-backed Built from public docs and product materials.
Test-backed Backed by guided testing or evaluator-run checks.
Runtime-verified Verified from authenticated runtime evidence.
Fresh production rerun via Rhumb Resolve matched direct Unstructured API control on document.parse with identical element-type output shape.
Fresh production rerun via Rhumb Resolve matched direct Unstructured API control on document.parse with identical 2-element output shape.
The API offers both a hosted REST service (Unstructured API) and self-hosted options via Python SDK. The REST API accepts file uploads and returns structured JSON elements. Partition strategies (auto, hi_res, fast, ocr_only) let agents trade off accuracy against latency. The Python SDK provides more control with chunking, embedding, and connector pipelines. For agents, the REST API is the primary integration surface. The main complexity is choosing the right partition strategy — hi_res is slow but accurate for complex layouts, fast is quick but misses some structure.
Authentication uses API keys for the hosted service. Self-hosted deployments can run without authentication. The access model is simple and agent-friendly. Rate limits apply to the hosted service based on plan tier. There are no per-document or per-user scoping options, which is fine for single-tenant agent deployments but limits multi-tenant platforms.
Documentation covers the API, SDK, connectors, and deployment options. The concepts section explains partitioning strategies, chunking, and element types well. The quickstart is clear. The main gap is that advanced patterns (custom model fine-tuning, complex connector pipelines) require more documentation than is available. Community resources help but the docs could be more comprehensive for edge cases.
Unstructured solves one of the hardest problems in AI agent pipelines: turning messy documents into clean, structured data that LLMs can process. It handles PDFs, Word documents, PowerPoints, images, HTML, emails, and more. For agents building RAG systems, processing uploaded documents, or extracting information from files, Unstructured provides the extraction layer. The output is a list of typed elements (Title, NarrativeText, Table, Image, etc.) with metadata about position, hierarchy, and source. This is the kind of infrastructure agents need but rarely build well on their own.
Error handling covers common failure modes: unsupported file types, corrupt files, oversized documents, and extraction failures. The API returns structured errors. The main reliability concern is extraction quality variance: results depend heavily on document complexity, and agents should implement quality checks on extraction output rather than assuming perfect fidelity. Processing latency varies widely — simple PDFs take seconds, complex scanned documents with OCR can take minutes.
Trust & provenance
This score is documentation-derived. Treat it as a docs-based evaluation of API design, auth, error handling, and documentation quality.
Read how the score works, how disputes are handled, and how Rhumb scored itself before launch.
Overall tier
7.0 / 10.0
No alternatives captured yet.