Apache PDFBox logo

Apache PDFBox

Apache PDFBox is an open-source Java library for working with PDF documents. It allows creation of new PDF documents, manipulation of existing documents, and the ability to extract content from documents with support for digital signatures.

1 APIs 5 Capabilities 7 Features 43.0 / 100 thin
Document ProcessingJavaPDFText ExtractionApacheOpen Source

API Rating

43.0/ 100
thin
Scored 2026-05-20 · rubric v0.3
Discoverability80.0
Contract Quality63.2
Governance39.5
Operational Transparency36.8
Developer Ergonomics8.7
Commercial Clarity39.5

APIs

Apache PDFBox

PDFBox provides a Java API for creating, manipulating, rendering, and extracting text and images from PDF documents, with support for digital signatures, form filling, PDF/A val...

Capabilities

Apache PDFBox API — Documents

Apache PDFBox API — Documents. 3 operations. Lead operation: Apache PDFBox Create Document. Self-contained Naftiko capability covering one Apache Pdfbox business surface.

Run with Naftiko

Apache PDFBox API — Forms

Apache PDFBox API — Forms. 1 operations. Lead operation: Apache PDFBox Get Form Fields. Self-contained Naftiko capability covering one Apache Pdfbox business surface.

Run with Naftiko

Apache PDFBox API — Operations

Apache PDFBox API — Operations. 2 operations. Lead operation: Apache PDFBox Merge Documents. Self-contained Naftiko capability covering one Apache Pdfbox business surface.

Run with Naftiko

Apache PDFBox API — Pages

Apache PDFBox API — Pages. 1 operations. Lead operation: Apache PDFBox List Pages. Self-contained Naftiko capability covering one Apache Pdfbox business surface.

Run with Naftiko

Apache PDFBox API — Signatures

Apache PDFBox API — Signatures. 1 operations. Lead operation: Apache PDFBox Sign Document. Self-contained Naftiko capability covering one Apache Pdfbox business surface.

Run with Naftiko

Features

PDF Text Extraction

Extract plain text and structured content from PDF documents

PDF Creation

Create new PDF documents programmatically with Java API

PDF Manipulation

Merge, split, rotate, and resize pages in existing PDFs

Digital Signatures

Apply and verify digital signatures for document authenticity

Form Filling

Read and fill interactive PDF forms (AcroForms)

PDF/A Validation

Validate and create PDF/A documents for archiving

Font Handling

Embed and extract fonts, handle Type 1, TrueType, and OpenType

Use Cases

Invoice Processing

Extract data from PDF invoices for automated processing

Document Generation

Generate PDF reports, contracts, and certificates programmatically

Legal Document Management

Digitally sign and verify legal documents

Form Data Collection

Fill PDF forms and extract submitted data

Archive Management

Convert documents to PDF/A for long-term archiving

Integrations

Apache Tika

Content detection and text extraction integration

Spring Boot

Spring Boot starter for PDF processing in web applications

Maven Central

Available as org.apache.pdfbox on Maven Central

iText/OpenPDF

Complementary PDF library for advanced PDF generation

Semantic Vocabularies

Apache Pdfbox Context

11 classes · 32 properties

JSON-LD

API Governance Rules

Apache PDFBox API Rules

6 rules · 4 errors 2 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
JSONLD
JSONLD

Sources

Raw ↑
aid: apache-pdfbox
name: Apache PDFBox
description: Apache PDFBox is an open-source Java library for working with PDF documents. It allows creation of new PDF documents,
  manipulation of existing documents, and the ability to extract content from documents with support for digital signatures.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Document Processing
- Java
- PDF
- Text Extraction
- Apache
- Open Source
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-pdfbox/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-pdfbox:apache-pdfbox
  name: Apache PDFBox
  description: PDFBox provides a Java API for creating, manipulating, rendering, and extracting text and images from PDF documents,
    with support for digital signatures, form filling, PDF/A validation, and font handling.
  humanURL: https://pdfbox.apache.org/2.0/getting-started.html
  tags:
  - Document Processing
  - Java
  - PDF
  - Apache
  - Open Source
  properties:
  - type: Documentation
    url: https://pdfbox.apache.org/2.0/getting-started.html
  - type: OpenAPI
    url: openapi/apache-pdfbox-api.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-documents.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-forms.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-operations.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-pages.yaml
  - type: NaftikoCapability
    url: capabilities/apache-pdfbox-signatures.yaml
maintainers:
- FN: Kin Lane
  email: [email protected]
common:
- type: GitHubOrganization
  url: https://github.com/apache/pdfbox
- type: Documentation
  url: https://pdfbox.apache.org/
- type: SpectralRules
  url: rules/apache-pdfbox-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-pdfbox-vocabulary.yaml
- type: JSONLD
  url: json-ld/apache-pdfbox-context.jsonld
- type: Features
  data:
  - name: PDF Text Extraction
    description: Extract plain text and structured content from PDF documents
  - name: PDF Creation
    description: Create new PDF documents programmatically with Java API
  - name: PDF Manipulation
    description: Merge, split, rotate, and resize pages in existing PDFs
  - name: Digital Signatures
    description: Apply and verify digital signatures for document authenticity
  - name: Form Filling
    description: Read and fill interactive PDF forms (AcroForms)
  - name: PDF/A Validation
    description: Validate and create PDF/A documents for archiving
  - name: Font Handling
    description: Embed and extract fonts, handle Type 1, TrueType, and OpenType
- type: UseCases
  data:
  - name: Invoice Processing
    description: Extract data from PDF invoices for automated processing
  - name: Document Generation
    description: Generate PDF reports, contracts, and certificates programmatically
  - name: Legal Document Management
    description: Digitally sign and verify legal documents
  - name: Form Data Collection
    description: Fill PDF forms and extract submitted data
  - name: Archive Management
    description: Convert documents to PDF/A for long-term archiving
- type: Integrations
  data:
  - name: Apache Tika
    description: Content detection and text extraction integration
  - name: Spring Boot
    description: Spring Boot starter for PDF processing in web applications
  - name: Maven Central
    description: Available as org.apache.pdfbox on Maven Central
  - name: iText/OpenPDF
    description: Complementary PDF library for advanced PDF generation