Apache PDFBox

Apache PDFBox is an open-source Java library for working with PDF documents. It allows creation of new PDF documents, manipulation of existing documents, and the ability to extract content from documents with support for digital signatures.

1 APIs 7 Features

Document ProcessingJavaPDFText ExtractionApacheOpen Source

APIs

Apache PDFBox

PDFBox provides a Java API for creating, manipulating, rendering, and extracting text and images from PDF documents, with support for digital signatures, form filling, PDF/A val...

Features

PDF Text Extraction

Extract plain text and structured content from PDF documents

PDF Creation

Create new PDF documents programmatically with Java API

PDF Manipulation

Merge, split, rotate, and resize pages in existing PDFs

Digital Signatures

Apply and verify digital signatures for document authenticity

Form Filling

Read and fill interactive PDF forms (AcroForms)

PDF/A Validation

Validate and create PDF/A documents for archiving

Font Handling

Embed and extract fonts, handle Type 1, TrueType, and OpenType

Use Cases

Invoice Processing

Extract data from PDF invoices for automated processing

Document Generation

Generate PDF reports, contracts, and certificates programmatically

Legal Document Management

Digitally sign and verify legal documents

Form Data Collection

Fill PDF forms and extract submitted data

Archive Management

Convert documents to PDF/A for long-term archiving

Integrations

Apache Tika

Content detection and text extraction integration

Spring Boot

Spring Boot starter for PDF processing in web applications

Maven Central

Available as org.apache.pdfbox on Maven Central

iText/OpenPDF

Complementary PDF library for advanced PDF generation

Semantic Vocabularies

Apache Pdfbox Context

11 classes · 32 properties

JSON-LD

API Governance Rules

Apache PDFBox API Rules

6 rules · 4 errors 2 info

SPECTRAL

Resources

👥

GitHubOrganization

Sources

aid: apache-pdfbox
name: Apache PDFBox
description: Apache PDFBox is an open-source Java library for working with PDF documents. It allows creation of new PDF documents,
  manipulation of existing documents, and the ability to extract content from documents with support for digital signatures.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Document Processing
- Java
- PDF
- Text Extraction
- Apache
- Open Source
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-pdfbox/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-pdfbox:apache-pdfbox
  name: Apache PDFBox
  description: PDFBox provides a Java API for creating, manipulating, rendering, and extracting text and images from PDF documents,
    with support for digital signatures, form filling, PDF/A validation, and font handling.
  humanURL: https://pdfbox.apache.org/2.0/getting-started.html
  tags:
  - Document Processing
  - Java
  - PDF
  - Apache
  - Open Source
  properties:
  - type: Documentation
    url: https://pdfbox.apache.org/2.0/getting-started.html
  - type: OpenAPI
    url: openapi/apache-pdfbox-api.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-documents.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-forms.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-operations.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-pages.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-signatures.yaml
maintainers:
- FN: Kin Lane
  email: [email protected]
common:
- type: GitHubOrganization
  url: https://github.com/apache/pdfbox
- type: Documentation
  url: https://pdfbox.apache.org/
- type: SpectralRules
  url: rules/apache-pdfbox-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-pdfbox-vocabulary.yaml
- type: JSONLD
  url: json-ld/apache-pdfbox-context.jsonld
- type: Features
  data:
  - name: PDF Text Extraction
    description: Extract plain text and structured content from PDF documents
  - name: PDF Creation
    description: Create new PDF documents programmatically with Java API
  - name: PDF Manipulation
    description: Merge, split, rotate, and resize pages in existing PDFs
  - name: Digital Signatures
    description: Apply and verify digital signatures for document authenticity
  - name: Form Filling
    description: Read and fill interactive PDF forms (AcroForms)
  - name: PDF/A Validation
    description: Validate and create PDF/A documents for archiving
  - name: Font Handling
    description: Embed and extract fonts, handle Type 1, TrueType, and OpenType
- type: UseCases
  data:
  - name: Invoice Processing
    description: Extract data from PDF invoices for automated processing
  - name: Document Generation
    description: Generate PDF reports, contracts, and certificates programmatically
  - name: Legal Document Management
    description: Digitally sign and verify legal documents
  - name: Form Data Collection
    description: Fill PDF forms and extract submitted data
  - name: Archive Management
    description: Convert documents to PDF/A for long-term archiving
- type: Integrations
  data:
  - name: Apache Tika
    description: Content detection and text extraction integration
  - name: Spring Boot
    description: Spring Boot starter for PDF processing in web applications
  - name: Maven Central
    description: Available as org.apache.pdfbox on Maven Central
  - name: iText/OpenPDF
    description: Complementary PDF library for advanced PDF generation