Apache OpenNLP logo

Apache OpenNLP

Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports common NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

1 APIs 10 Features
Machine LearningNatural Language ProcessingNLPText ProcessingApacheOpen SourceJava

APIs

Apache OpenNLP

OpenNLP provides a Java API for NLP tasks including tokenization, sentence detection, POS tagging, named entity recognition, chunking, parsing, and language detection, with supp...

Features

Language Detection

Detects document language using ISO-639-3 classification

Sentence Detection

Splits text into individual sentences with character offsets

Tokenization

Segments text into words and punctuation with position tracking

Named Entity Recognition

Detects persons, locations, organizations, and other named entities

POS Tagging

Assigns Penn Treebank POS tags to each token

Lemmatization

Reduces tokens to their dictionary base forms

Chunking

Identifies noun phrases, verb phrases, and other syntactic chunks

Parsing

Builds full syntactic parse trees using constituency parsing

Document Categorization

Classifies documents into predefined categories

Custom Model Training

Train custom models with Maxent, Perceptron, or Naive Bayes algorithms

Use Cases

Information Extraction

Extract structured data from unstructured text documents

Text Classification

Automatically categorize documents by topic or sentiment

Search Enhancement

Improve search relevance with NLP-based query processing

Content Analysis

Analyze large text corpora for entities, topics, and patterns

Chatbot Development

Build conversational AI with NLP intent and entity extraction

Integrations

Apache Solr

Integrate OpenNLP with Apache Solr for NLP-enhanced search

Apache Lucene

Use OpenNLP analyzers in Lucene text processing pipelines

Apache Flink

Real-time NLP processing with Apache Flink data streams

UIMA

Apache UIMA framework integration for unstructured information analysis

Maven/Gradle

Available on Maven Central for Java build system integration

Semantic Vocabularies

Apache Opennlp Context

18 classes · 24 properties

JSON-LD

API Governance Rules

Apache OpenNLP API Rules

10 rules · 4 errors 5 warnings 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
JSONLD
JSONLD

Sources

Raw ↑
aid: apache-opennlp
name: Apache OpenNLP
description: Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports common
  NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing,
  and coreference resolution.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Machine Learning
- Natural Language Processing
- NLP
- Text Processing
- Apache
- Open Source
- Java
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-opennlp/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-opennlp:apache-opennlp
  name: Apache OpenNLP
  description: OpenNLP provides a Java API for NLP tasks including tokenization, sentence detection, POS tagging, named entity
    recognition, chunking, parsing, and language detection, with support for training custom models.
  humanURL: https://opennlp.apache.org/docs/
  tags:
  - Java
  - NLP
  - Text Processing
  - Apache
  - Open Source
  - Machine Learning
  properties:
  - type: Documentation
    url: https://opennlp.apache.org/docs/
  - type: OpenAPI
    url: openapi/apache-opennlp-tools.yaml
  - type: NaftikoCapability
    url: capabilities/tools-chunking.yaml
  - type: NaftikoCapability
    url: capabilities/tools-document-categorization.yaml
  - type: NaftikoCapability
    url: capabilities/tools-language-detection.yaml
  - type: NaftikoCapability
    url: capabilities/tools-lemmatization.yaml
  - type: NaftikoCapability
    url: capabilities/tools-models.yaml
  - type: NaftikoCapability
    url: capabilities/tools-named-entity-recognition.yaml
  - type: NaftikoCapability
    url: capabilities/tools-pos-tagging.yaml
  - type: NaftikoCapability
    url: capabilities/tools-parsing.yaml
  - type: NaftikoCapability
    url: capabilities/tools-sentence-detection.yaml
  - type: NaftikoCapability
    url: capabilities/tools-tokenization.yaml
maintainers:
- FN: Kin Lane
  email: [email protected]
common:
- type: GitHubOrganization
  url: https://github.com/apache/opennlp
- type: Documentation
  url: https://opennlp.apache.org/
- type: GettingStarted
  url: https://opennlp.apache.org/docs/
- type: SpectralRules
  url: rules/apache-opennlp-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-opennlp-vocabulary.yaml
- type: JSONLD
  url: json-ld/apache-opennlp-context.jsonld
- type: Features
  data:
  - name: Language Detection
    description: Detects document language using ISO-639-3 classification
  - name: Sentence Detection
    description: Splits text into individual sentences with character offsets
  - name: Tokenization
    description: Segments text into words and punctuation with position tracking
  - name: Named Entity Recognition
    description: Detects persons, locations, organizations, and other named entities
  - name: POS Tagging
    description: Assigns Penn Treebank POS tags to each token
  - name: Lemmatization
    description: Reduces tokens to their dictionary base forms
  - name: Chunking
    description: Identifies noun phrases, verb phrases, and other syntactic chunks
  - name: Parsing
    description: Builds full syntactic parse trees using constituency parsing
  - name: Document Categorization
    description: Classifies documents into predefined categories
  - name: Custom Model Training
    description: Train custom models with Maxent, Perceptron, or Naive Bayes algorithms
- type: UseCases
  data:
  - name: Information Extraction
    description: Extract structured data from unstructured text documents
  - name: Text Classification
    description: Automatically categorize documents by topic or sentiment
  - name: Search Enhancement
    description: Improve search relevance with NLP-based query processing
  - name: Content Analysis
    description: Analyze large text corpora for entities, topics, and patterns
  - name: Chatbot Development
    description: Build conversational AI with NLP intent and entity extraction
- type: Integrations
  data:
  - name: Apache Solr
    description: Integrate OpenNLP with Apache Solr for NLP-enhanced search
  - name: Apache Lucene
    description: Use OpenNLP analyzers in Lucene text processing pipelines
  - name: Apache Flink
    description: Real-time NLP processing with Apache Flink data streams
  - name: UIMA
    description: Apache UIMA framework integration for unstructured information analysis
  - name: Maven/Gradle
    description: Available on Maven Central for Java build system integration