PDF Tools

PDF to CSV Converter

Convert the text in your PDF into a CSV file you can open in any spreadsheet app, import into a database, or process with a script. Each line on the page becomes a CSV row, and columns are split where there's a clear gap between values. Scanned, image-only PDFs are recognized with built-in OCR first, so even photographed documents can be turned into structured rows — entirely in your browser, with no uploads.

JavaScript Required

This tool requires JavaScript to run. Please enable JavaScript in your browser settings to use PDF to CSV Converter.

Why JavaScript? This tool processes your files entirely in your browser using WebAssembly — nothing is uploaded to servers. This privacy-first approach requires JavaScript to be enabled.

What does this tool do?

The PDF to CSV converter reads the text content of a PDF and writes a comma-separated values file. For text-based PDFs it extracts the words and positions directly; for image-only scans it runs optical character recognition in your browser first. Each recognized line is emitted as one CSV record, and wide horizontal gaps between words are interpreted as column separators so values land in their own fields. Fields containing commas, quotes, or line breaks are quoted and escaped per the RFC 4180 standard, and a UTF-8 byte-order mark is added so spreadsheet apps read accented and non-Latin characters correctly.

How it works

Scanned pages are rasterized and recognized with a Tesseract LSTM engine; text-based PDFs have their embedded text read directly. The recognized words carry pixel positions, which are grouped into visual lines by vertical alignment and sorted left to right. Within each line the spacing between words is measured, and a gap noticeably wider than ordinary word spacing becomes a column boundary, splitting the line into multiple fields. The fields are then serialized as RFC 4180 CSV — quoting and escaping where required, joining rows with CRLF, and prefixing a UTF-8 BOM — and offered as a downloadable .csv file.

Features

Standard RFC 4180 CSV output with proper quoting and escaping
Automatic OCR for scanned, image-only PDFs
Each page line becomes a CSV row
Best-effort column splitting on wide gaps between values
UTF-8 BOM so Excel reads Unicode correctly
Optional blank row between pages
100% in-browser — your file never leaves your device

How to use

1

Upload your PDF

Drag any PDF onto the drop zone. Text-based PDFs are read directly; scanned or photographed pages are detected and OCR'd automatically.
2

Choose column handling

Keep column splitting on to break each line into fields at wide gaps, or turn it off to keep each full line as a single field. Optionally add a blank row between pages.
3

Convert to CSV

Click Convert to CSV. Text is extracted (with OCR when needed) and written as comma-separated rows with proper escaping.
4

Open or import the file

Download the .csv and open it in Excel, Google Sheets, or LibreOffice — or import it into a database or feed it to a script.

Common use cases

Data pipelines

Extract tabular text from PDFs into CSV so it can be imported into databases, BI tools, or data-processing scripts.

Bank statements and ledgers

Turn transaction lines from a PDF statement into CSV rows ready for accounting software or a spreadsheet.

Scanned documents

Recognize text from scanned or photographed pages and export it as CSV, with columns split where the original had clear spacing.

Lightweight, portable export

Use CSV when you want a universal, plain-text format that opens everywhere and is easy to diff, version, and automate.

Tips & best practices

Column splitting is a best-effort heuristic based on spacing, not true table detection — review the result before importing
The UTF-8 BOM helps Excel display accented and non-Latin text; some strict parsers may need the BOM stripped
For scanned PDFs, higher-quality scans produce more accurate OCR and cleaner columns
Turn off column splitting when you want each line preserved as a single field

Frequently asked questions

Does it work on scanned PDFs?

Yes. Image-only pages are detected and recognized with built-in OCR before the CSV is built. Recognition quality depends on how clear the scan is.

How are columns determined?

Each line is split into fields where there is a clearly wider-than-normal gap between words. This is a best-effort heuristic, not true table detection, so some columns may need adjustment after import.

Will commas inside text break the file?

No. Fields containing commas, quotes, or line breaks are quoted and escaped according to the RFC 4180 CSV standard, so the file parses correctly.

Is this really free and private?

Yes. Everything runs in your browser using client-side processing and OCR. There are no uploads, no subscriptions, and no usage limits.