Home
arXiv
arXiv
arXiv is the open-access e-print repository operated by Cornell Tech, hosting more than two million preprints across physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering, and economics. arXiv exposes two principal programmatic interfaces: a REST Query API that returns Atom 1.0 XML and an OAI-PMH v2.0 endpoint for bulk metadata harvesting, plus daily RSS feeds and Amazon S3 / Kaggle distributions for full-text corpora.
4 APIs
1 Capabilities
9 Features
Science And Math Scholarly Publishing Preprints Open Access Research Open Source Public APIs
REST endpoint for searching arXiv and retrieving article metadata. Supports field-prefix queries (ti, au, abs, co, jr, cat, rn, id, all), AND/OR/ANDNOT operators, phrase groupin...
Open Archives Initiative Protocol for Metadata Harvesting v2.0 endpoint for bulk-syncing arXiv metadata. Supports Identify, ListSets, ListMetadataFormats, ListRecords, ListIdent...
Daily RSS feeds of new arXiv submissions, organised by archive and subject category. Primarily intended for human consumption; the OAI-PMH and query APIs are recommended for mac...
Full-text and source bulk distribution channels: an Amazon S3 Requester-Pays bucket containing every arXiv PDF and source archive, plus a periodically refreshed Kaggle dataset o...
Run Capabilities with Naftiko — Deploy and orchestrate these API capabilities using Naftiko Fleet.
Run with Naftiko
Run Capabilities with Naftiko — Deploy and orchestrate these API capabilities using Naftiko Fleet.
Run with Naftiko
Field-Prefix Search
Targeted search across title, author, abstract, comment, journal reference, category, report number, and ID.
Boolean Query Composition
AND, OR, and ANDNOT operators with phrase grouping and parentheses.
Date-Range Filtering
submittedDate and lastUpdatedDate ranges in UTC.
Sort Control
Sort by relevance, lastUpdatedDate, or submittedDate, ascending or descending.
ID-Lookup Mode
Fetch metadata for an explicit comma-separated list of arXiv IDs.
OAI-PMH Bulk Harvest
Industry-standard metadata harvesting with resumption tokens and incremental from-date queries.
Three Metadata Formats
oai_dc, arXiv, and arXivRaw exposed via OAI-PMH.
Bulk Full-Text
Amazon S3 Requester-Pays buckets and periodic Kaggle dataset.
Open Source Stack
arXiv operates its services from a public GitHub organization (arXiv) with 50+ active repositories.
Research Discovery Tools
Build search and recommendation interfaces over the arXiv corpus.
Citation And Bibliographic Apps
Pull metadata, DOIs, and journal references for reference managers.
AI Training And RAG
Build domain corpora for retrieval-augmented generation across scientific literature.
Topic Watching And Alerts
Schedule incremental harvests and notify users of new submissions in a category.
Bibliometrics And Trend Analysis
Aggregate metadata to study research trends, author networks, and category growth.
Academic Workflow Integration
Embed arXiv search into LaTeX editors, IDEs, note-taking tools, and chat assistants via MCP.
Semantic Scholar
Citation graph and paper-similarity overlay used by community tooling.
NASA ADS
Cross-references and bibliography overlay used in arXiv-bib-overlay.
DOI / CrossRef
Articles surface DOIs once a publisher version of record exists.
Amazon S3
Bulk PDF and source distribution through Requester-Pays buckets.
Kaggle
Periodically refreshed full metadata dataset.
Model Context Protocol
Multiple community MCP servers expose arXiv search to AI assistants.
Query API
Programmatic search and metadata retrieval.
OAI-PMH Harvest
Bulk metadata sync for downstream indexes.
RSS Feeds
Daily new-submission feeds per archive or subject.
Bulk Full-Text
S3 and Kaggle distributions for corpus-scale work.
15 classes · 3 properties
JSON-LD
7 rules ·
4 errors
3 warnings
SPECTRAL
Sources
aid: arxiv
name: arXiv
description: >-
arXiv is the open-access e-print repository operated by Cornell Tech, hosting
more than two million preprints across physics, mathematics, computer science,
quantitative biology, quantitative finance, statistics, electrical
engineering, and economics. arXiv exposes two principal programmatic
interfaces: a REST Query API that returns Atom 1.0 XML and an OAI-PMH v2.0
endpoint for bulk metadata harvesting, plus daily RSS feeds and Amazon S3 /
Kaggle distributions for full-text corpora.
url: https://info.arxiv.org/help/api/index.html
specificationVersion: '0.20'
created: '2026-05-28'
modified: '2026-05-29'
x-source: public-apis/public-apis
x-type: opensource
x-category: Science & Math
x-tier: 1
x-tier-reason: Cornell-operated open scholarship infrastructure with multiple long-lived public APIs.
tags:
- Science And Math
- Scholarly Publishing
- Preprints
- Open Access
- Research
- Open Source
- Public APIs
apis:
- name: arXiv Query API
description: >-
REST endpoint for searching arXiv and retrieving article metadata.
Supports field-prefix queries (ti, au, abs, co, jr, cat, rn, id, all),
AND/OR/ANDNOT operators, phrase grouping, and date-range filters on
submittedDate and lastUpdatedDate. Responses are Atom 1.0 XML with
arXiv and OpenSearch extensions.
humanURL: https://info.arxiv.org/help/api/user-manual.html
baseURL: https://export.arxiv.org/api/query
tags:
- Science And Math
- Scholarly Publishing
properties:
- type: Documentation
url: https://info.arxiv.org/help/api/user-manual.html
- type: APIReference
url: https://info.arxiv.org/help/api/user-manual.html#_calling_the_api
- type: GettingStarted
url: https://info.arxiv.org/help/api/basics.html
- type: OpenAPI
url: openapi/arxiv-query-openapi.yml
- type: JSONSchema
url: json-schema/arxiv-article-schema.json
title: Article
- type: JSONStructure
url: json-structure/arxiv-article-structure.json
title: Article
- type: Example
url: examples/arxiv-query-articles-example.json
title: Query Articles Example
- type: SDK
url: https://pypi.org/project/arxiv/
title: Python SDK (lukasschwab/arxiv.py)
- name: arXiv OAI-PMH API
description: >-
Open Archives Initiative Protocol for Metadata Harvesting v2.0 endpoint
for bulk-syncing arXiv metadata. Supports Identify, ListSets,
ListMetadataFormats, ListRecords, ListIdentifiers, and GetRecord with
oai_dc, arXiv, and arXivRaw metadata formats. Metadata refreshes ~10:30pm
ET Sunday-Thursday.
humanURL: https://info.arxiv.org/help/oa/index.html
baseURL: https://oaipmh.arxiv.org/oai
tags:
- Scholarly Publishing
- Bulk Data
properties:
- type: Documentation
url: https://info.arxiv.org/help/oa/index.html
- type: OpenAPI
url: openapi/arxiv-oaipmh-openapi.yml
- type: Example
url: examples/arxiv-oaipmh-listrecords-example.json
title: List Records Example
- name: arXiv RSS Feeds
description: >-
Daily RSS feeds of new arXiv submissions, organised by archive and
subject category. Primarily intended for human consumption; the OAI-PMH
and query APIs are recommended for machine integration.
humanURL: https://info.arxiv.org/help/rss.html
baseURL: https://rss.arxiv.org
tags:
- Scholarly Publishing
- Feeds
properties:
- type: Documentation
url: https://info.arxiv.org/help/rss.html
- name: arXiv Bulk Data
description: >-
Full-text and source bulk distribution channels: an Amazon S3
Requester-Pays bucket containing every arXiv PDF and source archive, plus
a periodically refreshed Kaggle dataset of the complete metadata corpus.
humanURL: https://info.arxiv.org/help/bulk_data.html
baseURL: https://info.arxiv.org/help/bulk_data_s3.html
tags:
- Bulk Data
- Open Data
properties:
- type: Documentation
url: https://info.arxiv.org/help/bulk_data.html
- type: Resources
url: https://info.arxiv.org/help/bulk_data_s3.html
title: Amazon S3 Bulk Buckets
- type: Resources
url: https://www.kaggle.com/datasets/Cornell-University/arxiv
title: Kaggle arXiv Dataset
common:
- type: Website
url: https://arxiv.org
- type: DeveloperPortal
url: https://info.arxiv.org/help/api/index.html
- type: Documentation
url: https://info.arxiv.org/help/api/user-manual.html
- type: TermsOfService
url: https://info.arxiv.org/help/api/tou.html
- type: PrivacyPolicy
url: https://info.arxiv.org/help/policies/privacy_policy.html
- type: StatusPage
url: https://status.arxiv.org/
- type: Blog
url: https://blog.arxiv.org/
- type: Support
url: https://info.arxiv.org/help/contact.html
- type: GitHubOrganization
url: https://github.com/arXiv
- type: ChangeLog
url: https://github.com/arXiv/arxiv-docs/commits/develop
- type: Plans
url: plans/arxiv-plans-pricing.yml
- type: RateLimits
url: rate-limits/arxiv-rate-limits.yml
- type: SpectralRules
url: rules/arxiv-rules.yml
- type: Vocabulary
url: vocabulary/arxiv-vocabulary.yml
- type: JSONLD
url: json-ld/arxiv-context.jsonld
title: arXiv JSON-LD Context
- type: NaftikoCapability
url: capabilities/shared/arxiv-query.yaml
title: Query Capability
- type: NaftikoCapability
url: capabilities/shared/arxiv-oaipmh.yaml
title: OAI-PMH Capability
- type: NaftikoCapability
url: capabilities/research-discovery.yaml
title: Research Discovery Workflow
- type: PublicAPIsListing
url: https://github.com/public-apis/public-apis
- type: Tools
url: https://github.com/blazickjp/arxiv-mcp-server
title: arXiv MCP Server (blazickjp)
- type: Tools
url: https://github.com/shoumikdc/arXiv-mcp
title: arXiv MCP (shoumikdc)
- type: Tools
url: https://github.com/Tejas242/arxiv-mcp
title: arXiv MCP (Tejas242)
- type: Tools
url: https://github.com/glaforge/arxiv-mcp-server
title: arXiv MCP Server in Java (glaforge)
- type: Tools
url: https://github.com/kelvingao/arxiv-mcp
title: arXiv MCP (kelvingao)
- type: SDK
url: https://pypi.org/project/arxiv/
title: arxiv Python wrapper (lukasschwab/arxiv.py)
- type: SDK
url: https://github.com/titipata/arxivpy
title: arxivpy Python client (titipata/arxivpy)
- type: GitHubRepository
url: https://github.com/arXiv/arxiv-search
title: arxiv-search (Search UI and APIs)
- type: GitHubRepository
url: https://github.com/arXiv/oaipmh
title: oaipmh (OAI-PMH service)
- type: GitHubRepository
url: https://github.com/arXiv/arxiv-feed
title: arxiv-feed (Atom and RSS service)
- type: GitHubRepository
url: https://github.com/arXiv/arxiv-canonical
title: arxiv-canonical (JSON schema for arXiv metadata)
- type: Features
data:
- name: Field-Prefix Search
description: Targeted search across title, author, abstract, comment, journal reference, category, report number, and ID.
- name: Boolean Query Composition
description: AND, OR, and ANDNOT operators with phrase grouping and parentheses.
- name: Date-Range Filtering
description: submittedDate and lastUpdatedDate ranges in UTC.
- name: Sort Control
description: Sort by relevance, lastUpdatedDate, or submittedDate, ascending or descending.
- name: ID-Lookup Mode
description: Fetch metadata for an explicit comma-separated list of arXiv IDs.
- name: OAI-PMH Bulk Harvest
description: Industry-standard metadata harvesting with resumption tokens and incremental from-date queries.
- name: Three Metadata Formats
description: oai_dc, arXiv, and arXivRaw exposed via OAI-PMH.
- name: Bulk Full-Text
description: Amazon S3 Requester-Pays buckets and periodic Kaggle dataset.
- name: Open Source Stack
description: arXiv operates its services from a public GitHub organization (arXiv) with 50+ active repositories.
- type: UseCases
data:
- name: Research Discovery Tools
description: Build search and recommendation interfaces over the arXiv corpus.
- name: Citation And Bibliographic Apps
description: Pull metadata, DOIs, and journal references for reference managers.
- name: AI Training And RAG
description: Build domain corpora for retrieval-augmented generation across scientific literature.
- name: Topic Watching And Alerts
description: Schedule incremental harvests and notify users of new submissions in a category.
- name: Bibliometrics And Trend Analysis
description: Aggregate metadata to study research trends, author networks, and category growth.
- name: Academic Workflow Integration
description: Embed arXiv search into LaTeX editors, IDEs, note-taking tools, and chat assistants via MCP.
- type: Integrations
data:
- name: Semantic Scholar
description: Citation graph and paper-similarity overlay used by community tooling.
- name: NASA ADS
description: Cross-references and bibliography overlay used in arXiv-bib-overlay.
- name: DOI / CrossRef
description: Articles surface DOIs once a publisher version of record exists.
- name: Amazon S3
description: Bulk PDF and source distribution through Requester-Pays buckets.
- name: Kaggle
description: Periodically refreshed full metadata dataset.
- name: Model Context Protocol
description: Multiple community MCP servers expose arXiv search to AI assistants.
- type: Solutions
data:
- name: Query API
description: Programmatic search and metadata retrieval.
- name: OAI-PMH Harvest
description: Bulk metadata sync for downstream indexes.
- name: RSS Feeds
description: Daily new-submission feeds per archive or subject.
- name: Bulk Full-Text
description: S3 and Kaggle distributions for corpus-scale work.
maintainers:
- FN: Kin Lane
email: [email protected]