Apache OpenNLP logo

Apache OpenNLP

Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports common NLP tasks such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

1 APIs 1 Capabilities 10 Features
Machine LearningNatural Language ProcessingNLPText ProcessingApacheOpen SourceJava

APIs

Apache OpenNLP

OpenNLP provides a Java API for NLP tasks including tokenization, sentence detection, POS tagging, named entity recognition, chunking, parsing, and language detection, with supp...

Capabilities

Apache OpenNLP NLP Pipeline Workflow

End-to-end NLP processing workflow combining language detection, sentence detection, tokenization, POS tagging, NER, chunking, and parsing for comprehensive text analysis.

Run with Naftiko

Features

Language Detection

Detects document language using ISO-639-3 classification

Sentence Detection

Splits text into individual sentences with character offsets

Tokenization

Segments text into words and punctuation with position tracking

Named Entity Recognition

Detects persons, locations, organizations, and other named entities

POS Tagging

Assigns Penn Treebank POS tags to each token

Lemmatization

Reduces tokens to their dictionary base forms

Chunking

Identifies noun phrases, verb phrases, and other syntactic chunks

Parsing

Builds full syntactic parse trees using constituency parsing

Document Categorization

Classifies documents into predefined categories

Custom Model Training

Train custom models with Maxent, Perceptron, or Naive Bayes algorithms

Use Cases

Information Extraction

Extract structured data from unstructured text documents

Text Classification

Automatically categorize documents by topic or sentiment

Search Enhancement

Improve search relevance with NLP-based query processing

Content Analysis

Analyze large text corpora for entities, topics, and patterns

Chatbot Development

Build conversational AI with NLP intent and entity extraction

Integrations

Apache Solr

Integrate OpenNLP with Apache Solr for NLP-enhanced search

Apache Lucene

Use OpenNLP analyzers in Lucene text processing pipelines

Apache Flink

Real-time NLP processing with Apache Flink data streams

UIMA

Apache UIMA framework integration for unstructured information analysis

Maven/Gradle

Available on Maven Central for Java build system integration

Semantic Vocabularies

Apache Opennlp Context

18 classes · 24 properties

JSON-LD

API Governance Rules

Apache OpenNLP API Rules

10 rules · 4 errors 5 warnings 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability
🔗
JSON-LD
JSON-LD