Scientific Intelligence//v1/preprints

PreprintNode
arXiv / bioRxiv / medRxiv Preprint Feed API

Real-time preprint feed with NLP-ready abstracts, author affiliations, and DOIs.

Built forScientific research agents, RAG pipelines, scouting AIs.
GET /v1/preprints
[
  {
    "id": "preprint-0000",
    "source": "example",
    "source_id": "preprint-0000",
    "type": "preprint",
    "discovered_at": "1970-01-01T00:00:00.000Z",
    "payload": {
      "archive": "bioRxiv",
      "affiliation": "Meta FAIR"
    }
  },
  {
    "id": "preprint-0001",
    "source": "example",
    "source_id": "preprint-0001",
    "type": "preprint",
    "discovered_at": "1969-12-31T22:00:00.000Z",
    "payload": {
      "archive": "bioRxiv",
      "affiliation": "Meta FAIR"
    }
  },
  {
    "id": "preprint-0002",
    "source": "example",
    "source_id": "preprint-0002",
    "type": "preprint",
    "discovered_at": "1969-12-31T18:00:00.000Z",
    "payload": {
      "archive": "arXiv cond-mat",
      "affiliation": "Meta FAIR"
    }
  }
]
Schema fields
  • titlePaper Title
  • payload.archiveArchive
  • payload.affiliationLead Affiliation
Realtime
Webhooks
HMAC signed
Starts at
$49/mo
Curator tier · cancel anytime
Drop-in for any stack

Wire PreprintNode into your agent — one snippet, seven frameworks.

One-line install
curl https://www.aiagentnode.io/api/v1/nodes/preprint/records?limit=10 \
  -H "Authorization: Bearer $AIAGENTNODE_KEY"
What teams build

PreprintNode powers…

Autonomous agents

Pipe PreprintNode into your LangChain / Vercel AI / Lovable agent for real-time decisions.

RAG enrichment

Schema-stable JSON drops straight into your vector store with predictable embeddings and no parsing.

Internal dashboards

Webhook into Slack, Notion, Linear, or your own ops console. Same auth across every node.

How PreprintNode works

Sourced from primary registries. Normalized for agents.

Primary data sources
  • arXiv OAI-PMH feed
  • bioRxiv API
  • medRxiv API
  • ChemRxiv
  • SSRN cross-checks

We ingest from the upstream of record — never from secondary scrapers — so every preprintnode record in your agent traces back to an authoritative publisher.

Methodology
  • Continuous polling of upstream sources with adaptive backoff
  • Deduplication via stable source_id + content fingerprint
  • LLM-friendly normalization into a single JSON envelope
  • Schema versioning so existing agents never break
  • HMAC-signed webhooks for guaranteed-delivery push
Update cadence
Every 15 minutes
End-to-end latency
Under 60 seconds from upstream publication to API
Coverage
All major STEM preprint archives
History
Rolling 36-month archive on Pro+
PreprintNode vs DIY scraping

Stop maintaining brittle scrapers. Ship the agent.

Most teams spend a quarter rebuilding what PreprintNode ships in a single API key. Here's the honest tradeoff.

DimensionPreprintNodeDIY scraper
Setup timeOne API key, one endpointWeeks of scraper engineering
Schema stabilityVersioned JSON contractBreaks every upstream redesign
Freshness<60s from publicationCron job, often hours stale
DeliveryPolling + HMAC webhooksBuild your own queue
LLM readinessToken-optimized payloadManual cleaning per record
ComplianceSource TOS handled upstreamYour legal exposure
Integrate in 3 steps

From signup to first PreprintNode record in under 5 minutes.

  1. 1. Generate an API key

    Pick a tier, complete checkout, and a Bearer token is minted instantly — no email handoff.

  2. 2. Call /v1/preprints

    Send an authenticated GET to receive a paginated JSON envelope. Cursor in, more records out.

  3. 3. Subscribe to webhooks

    Register an HTTPS endpoint to receive HMAC-signed pushes within seconds of upstream publication.

Frequently asked

PreprintNode questions, answered.

How fresh is the PreprintNode API data?+

New records appear in /v1/preprints within ~60 seconds of being published upstream. The full dataset is polled every 5 minutes and pushed to webhook subscribers in the same window.

What format does PreprintNode return?+

Every endpoint returns a stable JSON envelope: { id, source, source_id, type, discovered_at, payload }. The payload mirrors the source's natural shape, normalized and token-optimized for direct ingestion into LLMs, vector stores, and RAG pipelines.

Is PreprintNode suitable for AI agents and RAG pipelines?+

Yes — that is the primary design goal. Field keys are stable, units are normalized, free text is cleaned, and total token weight per record is minimized so you can drop responses directly into LangChain, Vercel AI SDK, OpenAI tools, MCP, n8n, or Lovable agents without preprocessing.

How is authentication handled?+

A single API key (Bearer token) works across every node. Webhook payloads are HMAC-SHA256 signed with your tenant secret so you can verify provenance before acting on a record.

Can I get historical PreprintNode data for backtesting?+

Pro tiers include a rolling 24-month backfill via the same endpoint with ?since=<ISO date>. Scale and Agency tiers can request full historical exports as compressed JSONL.

What is the rate limit?+

Curator: 60 req/min. Pro: 600 req/min. Scale: 6,000 req/min. Agency: 60,000 req/min. Enterprise: negotiated. All tiers support paginated cursors so you never need to spike for a backfill.

Why not just scrape preprintnode sources directly?+

Upstream sources change formats, throttle aggressively, and break silently. We absorb that fragility, version the schema, sign deliveries, and republish a single contract so your agent stays online when the source moves.

HMAC-signed webhooks
SHA-256 signatures on every push.
99.95% uptime SLA
Status + history at /status.
Versioned schema
No silent breaking changes, ever.
One auth across nodes
One key, every endpoint.
Tier ladder

Five tiers. One API surface.

Full pricing
Curator
$49/mo
Side projects, niche feeds, indie builders.
  • 10,000 API requests / month
  • 5 req/sec burst
  • No webhooks · poll only
Pro
$129/mo
Production apps & indie hackers shipping real revenue.
  • 100,000 API requests / month
  • 20 req/sec burst
  • Webhooks (10 endpoints)
Scale
$349/mo
High-volume products, agents, internal tooling.
  • 1,000,000 API requests / month
  • 100 req/sec burst
  • Webhooks + GraphQL endpoint
Agency
$799/mo
Agencies and platforms re-selling intelligence to clients.
  • 10,000,000 API requests / month
  • 500 req/sec burst
  • Webhooks + GraphQL + MCP server
Enterprise
Custom
Procurement, SSO, dedicated infrastructure.
  • Unmetered API access
  • Private endpoints + dedicated edge
  • Multi-region data residency
Contact sales

Wire PreprintNode into your agent in under a minute.

Compare tiers All nodes