arXiv

arXiv is the open-access e-print repository operated by Cornell Tech, hosting more than two million preprints across physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, and economics. arXiv exposes two principal programmatic interfaces: a REST Query API that returns Atom 1.0 XML and an OAI-PMH v2.0 endpoint for bulk metadata harvesting, plus daily RSS feeds and Amazon S3 / Kaggle distributions for full-text corpora.

4 APIs 1 Capabilities 9 Features

Science And MathScholarly PublishingPreprintsOpen AccessResearchOpen SourcePublic APIs

APIs

arXiv Query API

REST endpoint for searching arXiv and retrieving article metadata. Supports field-prefix queries (ti, au, abs, co, jr, cat, rn, id, all), AND/OR/ANDNOT operators, phrase groupin...

arXiv OAI-PMH API

Open Archives Initiative Protocol for Metadata Harvesting v2.0 endpoint for bulk-syncing arXiv metadata. Supports Identify, ListSets, ListMetadataFormats, ListRecords, ListIdent...

arXiv RSS Feeds

Daily RSS feeds of new arXiv submissions, organised by archive and subject category. Primarily intended for human consumption; the OAI-PMH and query APIs are recommended for mac...

arXiv Bulk Data

Full-text and source bulk distribution channels: an Amazon S3 Requester-Pays bucket containing every arXiv PDF and source archive, plus a periodically refreshed Kaggle dataset o...

Capabilities

Run Capabilities with Naftiko — Deploy and orchestrate these API capabilities using Naftiko Fleet.

Run with Naftiko

Research Discovery

Run with Naftiko

Run Capabilities with Naftiko — Deploy and orchestrate these API capabilities using Naftiko Fleet.

Run with Naftiko

Features

Field-Prefix Search

Targeted search across title, author, abstract, comment, journal reference, category, report number, and ID.

Boolean Query Composition

AND, OR, and ANDNOT operators with phrase grouping and parentheses.

Date-Range Filtering

submittedDate and lastUpdatedDate ranges in UTC.

Sort Control

Sort by relevance, lastUpdatedDate, or submittedDate, ascending or descending.

ID-Lookup Mode

Fetch metadata for an explicit comma-separated list of arXiv IDs.

OAI-PMH Bulk Harvest

Industry-standard metadata harvesting with resumption tokens and incremental from-date queries.

Three Metadata Formats

oai_dc, arXiv, and arXivRaw exposed via OAI-PMH.

Bulk Full-Text

Amazon S3 Requester-Pays buckets and periodic Kaggle dataset.

Open Source Stack

arXiv operates its services from a public GitHub organization (arXiv) with 50+ active repositories.

Use Cases

Research Discovery Tools

Build search and recommendation interfaces over the arXiv corpus.

Citation And Bibliographic Apps

Pull metadata, DOIs, and journal references for reference managers.

AI Training And RAG

Build domain corpora for retrieval-augmented generation across scientific literature.

Topic Watching And Alerts

Schedule incremental harvests and notify users of new submissions in a category.

Bibliometrics And Trend Analysis

Aggregate metadata to study research trends, author networks, and category growth.

Academic Workflow Integration

Embed arXiv search into LaTeX editors, IDEs, note-taking tools, and chat assistants via MCP.

Integrations

Semantic Scholar

Citation graph and paper-similarity overlay used by community tooling.

NASA ADS

Cross-references and bibliography overlay used in arXiv-bib-overlay.

DOI / CrossRef

Articles surface DOIs once a publisher version of record exists.

Amazon S3

Bulk PDF and source distribution through Requester-Pays buckets.

Kaggle

Periodically refreshed full metadata dataset.

Model Context Protocol

Multiple community MCP servers expose arXiv search to AI assistants.

Solutions

Query API

Programmatic search and metadata retrieval.

OAI-PMH Harvest

Bulk metadata sync for downstream indexes.

RSS Feeds

Daily new-submission feeds per archive or subject.

Bulk Full-Text

S3 and Kaggle distributions for corpus-scale work.

Semantic Vocabularies

Arxiv Context

15 classes · 3 properties

JSON-LD

API Governance Rules

arXiv API Rules

7 rules · 4 errors 3 warnings

SPECTRAL

Resources

DeveloperPortal

DeveloperPortal

Documentation

TermsOfService

PrivacyPolicy

GitHubOrganization

GitHubOrganization

SpectralRules

arXiv JSON-LD Context

Query Capability

NaftikoCapability

OAI-PMH Capability

NaftikoCapability

Research Discovery Workflow

NaftikoCapability

PublicAPIsListing

PublicAPIsListing

arXiv MCP Server (blazickjp)

arXiv MCP (shoumikdc)

arXiv MCP (Tejas242)

arXiv MCP Server in Java (glaforge)

arXiv MCP (kelvingao)

arxiv Python wrapper (lukasschwab/arxiv.py)

arxivpy Python client (titipata/arxivpy)

arxiv-search (Search UI and APIs)

GitHubRepository

oaipmh (OAI-PMH service)

GitHubRepository

arxiv-feed (Atom and RSS service)

GitHubRepository

arxiv-canonical (JSON schema for arXiv metadata)

GitHubRepository

Sources

aid: arxiv
name: arXiv
description: >-
  arXiv is the open-access e-print repository operated by Cornell Tech, hosting
  more than two million preprints across physics, mathematics, computer science,
  quantitative biology, quantitative finance, statistics, electrical
  engineering, and economics. arXiv exposes two principal programmatic
  interfaces: a REST Query API that returns Atom 1.0 XML and an OAI-PMH v2.0
  endpoint for bulk metadata harvesting, plus daily RSS feeds and Amazon S3 /
  Kaggle distributions for full-text corpora.
url: https://info.arxiv.org/help/api/index.html
specificationVersion: '0.20'
created: '2026-05-28'
modified: '2026-05-29'
x-source: public-apis/public-apis
x-type: opensource
x-category: Science & Math
x-tier: 1
x-tier-reason: Cornell-operated open scholarship infrastructure with multiple long-lived public APIs.
tags:
  - Science And Math
  - Scholarly Publishing
  - Preprints
  - Open Access
  - Research
  - Open Source
  - Public APIs
apis:
  - name: arXiv Query API
    description: >-
      REST endpoint for searching arXiv and retrieving article metadata.
      Supports field-prefix queries (ti, au, abs, co, jr, cat, rn, id, all),
      AND/OR/ANDNOT operators, phrase grouping, and date-range filters on
      submittedDate and lastUpdatedDate. Responses are Atom 1.0 XML with
      arXiv and OpenSearch extensions.
    humanURL: https://info.arxiv.org/help/api/user-manual.html
    baseURL: https://export.arxiv.org/api/query
    tags:
      - Science And Math
      - Scholarly Publishing
    properties:
      - type: Documentation
        url: https://info.arxiv.org/help/api/user-manual.html
      - type: APIReference
        url: https://info.arxiv.org/help/api/user-manual.html#_calling_the_api
      - type: GettingStarted
        url: https://info.arxiv.org/help/api/basics.html
      - type: OpenAPI
        url: openapi/arxiv-query-openapi.yml
      - type: JSONSchema
        url: json-schema/arxiv-article-schema.json
        title: Article
      - type: JSONStructure
        url: json-structure/arxiv-article-structure.json
        title: Article
      - type: Example
        url: examples/arxiv-query-articles-example.json
        title: Query Articles Example
      - type: SDK
        url: https://pypi.org/project/arxiv/
        title: Python SDK (lukasschwab/arxiv.py)
  - name: arXiv OAI-PMH API
    description: >-
      Open Archives Initiative Protocol for Metadata Harvesting v2.0 endpoint
      for bulk-syncing arXiv metadata. Supports Identify, ListSets,
      ListMetadataFormats, ListRecords, ListIdentifiers, and GetRecord with
      oai_dc, arXiv, and arXivRaw metadata formats. Metadata refreshes ~10:30pm
      ET Sunday-Thursday.
    humanURL: https://info.arxiv.org/help/oa/index.html
    baseURL: https://oaipmh.arxiv.org/oai
    tags:
      - Scholarly Publishing
      - Bulk Data
    properties:
      - type: Documentation
        url: https://info.arxiv.org/help/oa/index.html
      - type: OpenAPI
        url: openapi/arxiv-oaipmh-openapi.yml
      - type: Example
        url: examples/arxiv-oaipmh-listrecords-example.json
        title: List Records Example
  - name: arXiv RSS Feeds
    description: >-
      Daily RSS feeds of new arXiv submissions, organised by archive and
      subject category. Primarily intended for human consumption; the OAI-PMH
      and query APIs are recommended for machine integration.
    humanURL: https://info.arxiv.org/help/rss.html
    baseURL: https://rss.arxiv.org
    tags:
      - Scholarly Publishing
      - Feeds
    properties:
      - type: Documentation
        url: https://info.arxiv.org/help/rss.html
  - name: arXiv Bulk Data
    description: >-
      Full-text and source bulk distribution channels: an Amazon S3
      Requester-Pays bucket containing every arXiv PDF and source archive, plus
      a periodically refreshed Kaggle dataset of the complete metadata corpus.
    humanURL: https://info.arxiv.org/help/bulk_data.html
    baseURL: https://info.arxiv.org/help/bulk_data_s3.html
    tags:
      - Bulk Data
      - Open Data
    properties:
      - type: Documentation
        url: https://info.arxiv.org/help/bulk_data.html
      - type: Resources
        url: https://info.arxiv.org/help/bulk_data_s3.html
        title: Amazon S3 Bulk Buckets
      - type: Resources
        url: https://www.kaggle.com/datasets/Cornell-University/arxiv
        title: Kaggle arXiv Dataset
common:
  - type: Website
    url: https://arxiv.org
  - type: DeveloperPortal
    url: https://info.arxiv.org/help/api/index.html
  - type: Documentation
    url: https://info.arxiv.org/help/api/user-manual.html
  - type: TermsOfService
    url: https://info.arxiv.org/help/api/tou.html
  - type: PrivacyPolicy
    url: https://info.arxiv.org/help/policies/privacy_policy.html
  - type: StatusPage
    url: https://status.arxiv.org/
  - type: Blog
    url: https://blog.arxiv.org/
  - type: Support
    url: https://info.arxiv.org/help/contact.html
  - type: GitHubOrganization
    url: https://github.com/arXiv
  - type: ChangeLog
    url: https://github.com/arXiv/arxiv-docs/commits/develop
  - type: Plans
    url: plans/arxiv-plans-pricing.yml
  - type: RateLimits
    url: rate-limits/arxiv-rate-limits.yml
  - type: SpectralRules
    url: rules/arxiv-rules.yml
  - type: Vocabulary
    url: vocabulary/arxiv-vocabulary.yml
  - type: JSONLD
    url: json-ld/arxiv-context.jsonld
    title: arXiv JSON-LD Context
  - type: NaftikoCapability
    url: capabilities/shared/arxiv-query.yaml
    title: Query Capability
  - type: NaftikoCapability
    url: capabilities/shared/arxiv-oaipmh.yaml
    title: OAI-PMH Capability
  - type: NaftikoCapability
    url: capabilities/research-discovery.yaml
    title: Research Discovery Workflow
  - type: PublicAPIsListing
    url: https://github.com/public-apis/public-apis
  - type: Tools
    url: https://github.com/blazickjp/arxiv-mcp-server
    title: arXiv MCP Server (blazickjp)
  - type: Tools
    url: https://github.com/shoumikdc/arXiv-mcp
    title: arXiv MCP (shoumikdc)
  - type: Tools
    url: https://github.com/Tejas242/arxiv-mcp
    title: arXiv MCP (Tejas242)
  - type: Tools
    url: https://github.com/glaforge/arxiv-mcp-server
    title: arXiv MCP Server in Java (glaforge)
  - type: Tools
    url: https://github.com/kelvingao/arxiv-mcp
    title: arXiv MCP (kelvingao)
  - type: SDK
    url: https://pypi.org/project/arxiv/
    title: arxiv Python wrapper (lukasschwab/arxiv.py)
  - type: SDK
    url: https://github.com/titipata/arxivpy
    title: arxivpy Python client (titipata/arxivpy)
  - type: GitHubRepository
    url: https://github.com/arXiv/arxiv-search
    title: arxiv-search (Search UI and APIs)
  - type: GitHubRepository
    url: https://github.com/arXiv/oaipmh
    title: oaipmh (OAI-PMH service)
  - type: GitHubRepository
    url: https://github.com/arXiv/arxiv-feed
    title: arxiv-feed (Atom and RSS service)
  - type: GitHubRepository
    url: https://github.com/arXiv/arxiv-canonical
    title: arxiv-canonical (JSON schema for arXiv metadata)
  - type: Features
    data:
      - name: Field-Prefix Search
        description: Targeted search across title, author, abstract, comment, journal reference, category, report number, and ID.
      - name: Boolean Query Composition
        description: AND, OR, and ANDNOT operators with phrase grouping and parentheses.
      - name: Date-Range Filtering
        description: submittedDate and lastUpdatedDate ranges in UTC.
      - name: Sort Control
        description: Sort by relevance, lastUpdatedDate, or submittedDate, ascending or descending.
      - name: ID-Lookup Mode
        description: Fetch metadata for an explicit comma-separated list of arXiv IDs.
      - name: OAI-PMH Bulk Harvest
        description: Industry-standard metadata harvesting with resumption tokens and incremental from-date queries.
      - name: Three Metadata Formats
        description: oai_dc, arXiv, and arXivRaw exposed via OAI-PMH.
      - name: Bulk Full-Text
        description: Amazon S3 Requester-Pays buckets and periodic Kaggle dataset.
      - name: Open Source Stack
        description: arXiv operates its services from a public GitHub organization (arXiv) with 50+ active repositories.
  - type: UseCases
    data:
      - name: Research Discovery Tools
        description: Build search and recommendation interfaces over the arXiv corpus.
      - name: Citation And Bibliographic Apps
        description: Pull metadata, DOIs, and journal references for reference managers.
      - name: AI Training And RAG
        description: Build domain corpora for retrieval-augmented generation across scientific literature.
      - name: Topic Watching And Alerts
        description: Schedule incremental harvests and notify users of new submissions in a category.
      - name: Bibliometrics And Trend Analysis
        description: Aggregate metadata to study research trends, author networks, and category growth.
      - name: Academic Workflow Integration
        description: Embed arXiv search into LaTeX editors, IDEs, note-taking tools, and chat assistants via MCP.
  - type: Integrations
    data:
      - name: Semantic Scholar
        description: Citation graph and paper-similarity overlay used by community tooling.
      - name: NASA ADS
        description: Cross-references and bibliography overlay used in arXiv-bib-overlay.
      - name: DOI / CrossRef
        description: Articles surface DOIs once a publisher version of record exists.
      - name: Amazon S3
        description: Bulk PDF and source distribution through Requester-Pays buckets.
      - name: Kaggle
        description: Periodically refreshed full metadata dataset.
      - name: Model Context Protocol
        description: Multiple community MCP servers expose arXiv search to AI assistants.
  - type: Solutions
    data:
      - name: Query API
        description: Programmatic search and metadata retrieval.
      - name: OAI-PMH Harvest
        description: Bulk metadata sync for downstream indexes.
      - name: RSS Feeds
        description: Daily new-submission feeds per archive or subject.
      - name: Bulk Full-Text
        description: S3 and Kaggle distributions for corpus-scale work.
maintainers:
  - FN: Kin Lane
    email: [email protected]