Apache PDFBox logo

Apache PDFBox

Apache PDFBox is an open-source Java library for working with PDF documents. It allows creation of new PDF documents, manipulation of existing documents, and the ability to extract content from documents with support for digital signatures.

1 APIs 1 Capabilities 7 Features
Document ProcessingJavaPDFText ExtractionApacheOpen Source

APIs

Apache PDFBox

PDFBox provides a Java API for creating, manipulating, rendering, and extracting text and images from PDF documents, with support for digital signatures, form filling, PDF/A val...

Capabilities

Apache PDFBox Document Processing Workflow

Workflow for creating, manipulating, extracting text from, and digitally signing PDF documents using Apache PDFBox.

Run with Naftiko

Features

PDF Text Extraction

Extract plain text and structured content from PDF documents

PDF Creation

Create new PDF documents programmatically with Java API

PDF Manipulation

Merge, split, rotate, and resize pages in existing PDFs

Digital Signatures

Apply and verify digital signatures for document authenticity

Form Filling

Read and fill interactive PDF forms (AcroForms)

PDF/A Validation

Validate and create PDF/A documents for archiving

Font Handling

Embed and extract fonts, handle Type 1, TrueType, and OpenType

Use Cases

Invoice Processing

Extract data from PDF invoices for automated processing

Document Generation

Generate PDF reports, contracts, and certificates programmatically

Legal Document Management

Digitally sign and verify legal documents

Form Data Collection

Fill PDF forms and extract submitted data

Archive Management

Convert documents to PDF/A for long-term archiving

Integrations

Apache Tika

Content detection and text extraction integration

Spring Boot

Spring Boot starter for PDF processing in web applications

Maven Central

Available as org.apache.pdfbox on Maven Central

iText/OpenPDF

Complementary PDF library for advanced PDF generation

Semantic Vocabularies

Apache Pdfbox Context

11 classes · 32 properties

JSON-LD

API Governance Rules

Apache PDFBox API Rules

6 rules · 4 errors 2 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability
🔗
JSON-LD
JSON-LD