Apache OpenNLP logo

Apache OpenNLP

Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports common NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

1 APIs 10 Capabilities 10 Features 46.1 / 100 developing
Machine LearningNatural Language ProcessingNLPText ProcessingApacheOpen SourceJava

API Rating

46.1/ 100
developing
Scored 2026-05-20 · rubric v0.3
Discoverability80.0
Contract Quality63.2
Governance47.4
Operational Transparency36.8
Developer Ergonomics19.6
Commercial Clarity39.5

APIs

Apache OpenNLP

OpenNLP provides a Java API for NLP tasks including tokenization, sentence detection, POS tagging, named entity recognition, chunking, parsing, and language detection, with supp...

Capabilities

Apache OpenNLP Tools API — Chunking

Apache OpenNLP Tools API — Chunking. 1 operations. Lead operation: Apache OpenNLP Chunk Text. Self-contained Naftiko capability covering one Apache Opennlp business surface.

Run with Naftiko

Apache OpenNLP Tools API — Document Categorization

Apache OpenNLP Tools API — Document Categorization. 1 operations. Lead operation: Apache OpenNLP Categorize Document. Self-contained Naftiko capability covering one Apache Openn...

Run with Naftiko

Apache OpenNLP Tools API — Language Detection

Apache OpenNLP Tools API — Language Detection. 1 operations. Lead operation: Apache OpenNLP Detect Language. Self-contained Naftiko capability covering one Apache Opennlp busine...

Run with Naftiko

Apache OpenNLP Tools API — Lemmatization

Apache OpenNLP Tools API — Lemmatization. 1 operations. Lead operation: Apache OpenNLP Lemmatize Text. Self-contained Naftiko capability covering one Apache Opennlp business sur...

Run with Naftiko

Apache OpenNLP Tools API — Models

Apache OpenNLP Tools API — Models. 2 operations. Lead operation: Apache OpenNLP List Available Models. Self-contained Naftiko capability covering one Apache Opennlp business sur...

Run with Naftiko

Apache OpenNLP Tools API — Named Entity Recognition

Apache OpenNLP Tools API — Named Entity Recognition. 1 operations. Lead operation: Apache OpenNLP Find Named Entities. Self-contained Naftiko capability covering one Apache Open...

Run with Naftiko

Apache OpenNLP Tools API — Parsing

Apache OpenNLP Tools API — Parsing. 1 operations. Lead operation: Apache OpenNLP Parse Text. Self-contained Naftiko capability covering one Apache Opennlp business surface.

Run with Naftiko

Apache OpenNLP Tools API — POS Tagging

Apache OpenNLP Tools API — POS Tagging. 1 operations. Lead operation: Apache OpenNLP Tag Parts of Speech. Self-contained Naftiko capability covering one Apache Opennlp business ...

Run with Naftiko

Apache OpenNLP Tools API — Sentence Detection

Apache OpenNLP Tools API — Sentence Detection. 1 operations. Lead operation: Apache OpenNLP Detect Sentences. Self-contained Naftiko capability covering one Apache Opennlp busin...

Run with Naftiko

Apache OpenNLP Tools API — Tokenization

Apache OpenNLP Tools API — Tokenization. 1 operations. Lead operation: Apache OpenNLP Tokenize Text. Self-contained Naftiko capability covering one Apache Opennlp business surface.

Run with Naftiko

Features

Language Detection

Detects document language using ISO-639-3 classification

Sentence Detection

Splits text into individual sentences with character offsets

Tokenization

Segments text into words and punctuation with position tracking

Named Entity Recognition

Detects persons, locations, organizations, and other named entities

POS Tagging

Assigns Penn Treebank POS tags to each token

Lemmatization

Reduces tokens to their dictionary base forms

Chunking

Identifies noun phrases, verb phrases, and other syntactic chunks

Parsing

Builds full syntactic parse trees using constituency parsing

Document Categorization

Classifies documents into predefined categories

Custom Model Training

Train custom models with Maxent, Perceptron, or Naive Bayes algorithms

Use Cases

Information Extraction

Extract structured data from unstructured text documents

Text Classification

Automatically categorize documents by topic or sentiment

Search Enhancement

Improve search relevance with NLP-based query processing

Content Analysis

Analyze large text corpora for entities, topics, and patterns

Chatbot Development

Build conversational AI with NLP intent and entity extraction

Integrations

Apache Solr

Integrate OpenNLP with Apache Solr for NLP-enhanced search

Apache Lucene

Use OpenNLP analyzers in Lucene text processing pipelines

Apache Flink

Real-time NLP processing with Apache Flink data streams

UIMA

Apache UIMA framework integration for unstructured information analysis

Maven/Gradle

Available on Maven Central for Java build system integration

Semantic Vocabularies

Apache Opennlp Context

18 classes · 24 properties

JSON-LD

API Governance Rules

Apache OpenNLP API Rules

10 rules · 4 errors 5 warnings 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
JSONLD
JSONLD

Sources

Raw ↑
aid: apache-opennlp
name: Apache OpenNLP
description: Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports common
  NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing,
  and coreference resolution.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Machine Learning
- Natural Language Processing
- NLP
- Text Processing
- Apache
- Open Source
- Java
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-opennlp/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-opennlp:apache-opennlp
  name: Apache OpenNLP
  description: OpenNLP provides a Java API for NLP tasks including tokenization, sentence detection, POS tagging, named entity
    recognition, chunking, parsing, and language detection, with support for training custom models.
  humanURL: https://opennlp.apache.org/docs/
  tags:
  - Java
  - NLP
  - Text Processing
  - Apache
  - Open Source
  - Machine Learning
  properties:
  - type: Documentation
    url: https://opennlp.apache.org/docs/
  - type: OpenAPI
    url: openapi/apache-opennlp-tools.yaml
  - type: NaftikoCapability
    url: capabilities/tools-chunking.yaml
  - type: NaftikoCapability
    url: capabilities/tools-document-categorization.yaml
  - type: NaftikoCapability
    url: capabilities/tools-language-detection.yaml
  - type: NaftikoCapability
    url: capabilities/tools-lemmatization.yaml
  - type: NaftikoCapability
    url: capabilities/tools-models.yaml
  - type: NaftikoCapability
    url: capabilities/tools-named-entity-recognition.yaml
  - type: NaftikoCapability
    url: capabilities/tools-pos-tagging.yaml
  - type: NaftikoCapability
    url: capabilities/tools-parsing.yaml
  - type: NaftikoCapability
    url: capabilities/tools-sentence-detection.yaml
  - type: NaftikoCapability
    url: capabilities/tools-tokenization.yaml
maintainers:
- FN: Kin Lane
  email: [email protected]
common:
- type: GitHubOrganization
  url: https://github.com/apache/opennlp
- type: Documentation
  url: https://opennlp.apache.org/
- type: GettingStarted
  url: https://opennlp.apache.org/docs/
- type: SpectralRules
  url: rules/apache-opennlp-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-opennlp-vocabulary.yaml
- type: JSONLD
  url: json-ld/apache-opennlp-context.jsonld
- type: Features
  data:
  - name: Language Detection
    description: Detects document language using ISO-639-3 classification
  - name: Sentence Detection
    description: Splits text into individual sentences with character offsets
  - name: Tokenization
    description: Segments text into words and punctuation with position tracking
  - name: Named Entity Recognition
    description: Detects persons, locations, organizations, and other named entities
  - name: POS Tagging
    description: Assigns Penn Treebank POS tags to each token
  - name: Lemmatization
    description: Reduces tokens to their dictionary base forms
  - name: Chunking
    description: Identifies noun phrases, verb phrases, and other syntactic chunks
  - name: Parsing
    description: Builds full syntactic parse trees using constituency parsing
  - name: Document Categorization
    description: Classifies documents into predefined categories
  - name: Custom Model Training
    description: Train custom models with Maxent, Perceptron, or Naive Bayes algorithms
- type: UseCases
  data:
  - name: Information Extraction
    description: Extract structured data from unstructured text documents
  - name: Text Classification
    description: Automatically categorize documents by topic or sentiment
  - name: Search Enhancement
    description: Improve search relevance with NLP-based query processing
  - name: Content Analysis
    description: Analyze large text corpora for entities, topics, and patterns
  - name: Chatbot Development
    description: Build conversational AI with NLP intent and entity extraction
- type: Integrations
  data:
  - name: Apache Solr
    description: Integrate OpenNLP with Apache Solr for NLP-enhanced search
  - name: Apache Lucene
    description: Use OpenNLP analyzers in Lucene text processing pipelines
  - name: Apache Flink
    description: Real-time NLP processing with Apache Flink data streams
  - name: UIMA
    description: Apache UIMA framework integration for unstructured information analysis
  - name: Maven/Gradle
    description: Available on Maven Central for Java build system integration