pennypdf

Developer API

PDF to Word API

PDF-to-Word is the single most-requested conversion in developer PDF APIs — resume parsing, contract editing, legal-document workflows all need it. It's also the one where cheap tools produce unusable output: text reflows wrongly, tables fall apart, fonts get substituted into unreadable messes.

PennyPDF's /v1/pdf-to-word endpoint routes every job through LibreOffice — the same open-source engine Smallpdf, iLovePDF, and most of the market use under the hood. The difference is pricing. Smallpdf API: $9-$12/month floor. iLovePDF API: subscription-tiered. PennyPDF: 2 coins (~$0.08) per conversion, no floor.

Tables are preserved with cell structure. Two- and three-column layouts stay as columns (not reflowed into one). Images embed at original resolution. Fonts substitute to the nearest Word-available equivalent when the PDF uses non-embedded fonts (there's no way around this without rasterization).

Copy, paste, ship

Same bearer-token auth across every endpoint. Set PENNYPDF_API_KEY in your environment first.

curlPOST /v1/pdf-to-word
curl -X POST https://api.pennypdf.com/v1/pdf-to-word \
  -H "Authorization: Bearer $PENNYPDF_API_KEY" \
  -F "file=@contract.pdf" \
  -o contract.docx
Pythonasync with polling
import os, time, requests

auth = {"Authorization": f"Bearer {os.environ['PENNYPDF_API_KEY']}"}

# Kick off an async job for large PDFs (>50 pages)
r = requests.post(
    "https://api.pennypdf.com/v1/jobs/pdf-to-word",
    headers=auth,
    files={"file": open("bigdoc.pdf", "rb")},
)
job_id = r.json()["job_id"]

while True:
    j = requests.get(f"https://api.pennypdf.com/v1/jobs/{job_id}", headers=auth).json()
    if j["status"] in ("completed", "failed"): break
    time.sleep(1)

open("bigdoc.docx", "wb").write(requests.get(j["output"]["url"]).content)

How it works

  1. 1POST the PDF as multipart form-data to /v1/pdf-to-word.
  2. 2For PDFs under 50 pages: receive the .docx synchronously in 5–15 s. Over 50 pages: use /v1/jobs/pdf-to-word with polling or a webhook.
  3. 3Coins debit on successful conversion only — failed jobs are never charged.

Frequently asked

How does the quality compare to Adobe Acrobat's PDF-to-Word?+

For 90% of documents (text, simple tables, images, 1-2 column layouts), the results are functionally identical. Adobe's proprietary engine has an edge on heavily-styled PDFs (brochures, magazines with complex typography) — the LibreOffice pipeline handles those as best-effort.

Does OCR happen automatically for scanned PDFs?+

No — PDF-to-Word assumes an existing text layer. For scanned PDFs, run /v1/ocr first (3 coins) to add a text layer, then /v1/pdf-to-word (2 coins). Total: 5 coins (~$0.20) per scanned doc converted to editable Word.

What's the max file size?+

100 MB synchronous, 2 GB asynchronous. Practical upper bound: 500-page contracts convert in ~45 s, 2000-page dossiers in ~3 min via the async endpoint.

Latency?+

Synchronous endpoint: p50 = 6 s, p90 = 14 s, p99 = 28 s for a 20-page PDF. Async endpoint for the same input: 2 s to receive the job_id; conversion completes in the same time range in the background.

Rate limits?+

50 synchronous conversions per minute per API key (heavier than merge because of CPU cost). Async: 200 job creations per minute, processed in order. Email api@pennypdf.com for bulk workloads (10k+/day).

What Word version is the output compatible with?+

Docx (Office 2007+), compatible with Word, Google Docs, Apple Pages, and LibreOffice Writer. No .doc (legacy) output — if you need it, convert from .docx with any of the above.

Why PennyPDF

  • No subscription. Ever.
  • Coins never expire — use them in 5 years.
  • Client-side processing for 14 of 22 tools.
  • No watermarks at any tier.
  • Per-operation pricing, shown before you click.
  • Same coins for web + public API.