PDF to CSV Converter
Convert the text in your PDF into a CSV file you can open in any spreadsheet app, import into a database, or process with a script. Each line on the page becomes a CSV row, and columns are split where there's a clear gap between values. Scanned, image-only PDFs are recognized with built-in OCR first, so even photographed documents can be turned into structured rows — entirely in your browser, with no uploads.
What does this tool do?
The PDF to CSV converter reads the text content of a PDF and writes a comma-separated values file. For text-based PDFs it extracts the words and positions directly; for image-only scans it runs optical character recognition in your browser first. Each recognized line is emitted as one CSV record, and wide horizontal gaps between words are interpreted as column separators so values land in their own fields. Fields containing commas, quotes, or line breaks are quoted and escaped per the RFC 4180 standard, and a UTF-8 byte-order mark is added so spreadsheet apps read accented and non-Latin characters correctly.
How it works
Scanned pages are rasterized and recognized with a Tesseract LSTM engine; text-based PDFs have their embedded text read directly. The recognized words carry pixel positions, which are grouped into visual lines by vertical alignment and sorted left to right. Within each line the spacing between words is measured, and a gap noticeably wider than ordinary word spacing becomes a column boundary, splitting the line into multiple fields. The fields are then serialized as RFC 4180 CSV — quoting and escaping where required, joining rows with CRLF, and prefixing a UTF-8 BOM — and offered as a downloadable .csv file.
Features
- Standard RFC 4180 CSV output with proper quoting and escaping
- Automatic OCR for scanned, image-only PDFs
- Each page line becomes a CSV row
- Best-effort column splitting on wide gaps between values
- UTF-8 BOM so Excel reads Unicode correctly
- Optional blank row between pages
- 100% in-browser — your file never leaves your device
How to use
- 1
Upload your PDF
Drag any PDF onto the drop zone. Text-based PDFs are read directly; scanned or photographed pages are detected and OCR'd automatically.
- 2
Choose column handling
Keep column splitting on to break each line into fields at wide gaps, or turn it off to keep each full line as a single field. Optionally add a blank row between pages.
- 3
Convert to CSV
Click Convert to CSV. Text is extracted (with OCR when needed) and written as comma-separated rows with proper escaping.
- 4
Open or import the file
Download the .csv and open it in Excel, Google Sheets, or LibreOffice — or import it into a database or feed it to a script.
Common use cases
Data pipelines
Extract tabular text from PDFs into CSV so it can be imported into databases, BI tools, or data-processing scripts.
Bank statements and ledgers
Turn transaction lines from a PDF statement into CSV rows ready for accounting software or a spreadsheet.
Scanned documents
Recognize text from scanned or photographed pages and export it as CSV, with columns split where the original had clear spacing.
Lightweight, portable export
Use CSV when you want a universal, plain-text format that opens everywhere and is easy to diff, version, and automate.
Tips & best practices
- Column splitting is a best-effort heuristic based on spacing, not true table detection — review the result before importing
- The UTF-8 BOM helps Excel display accented and non-Latin text; some strict parsers may need the BOM stripped
- For scanned PDFs, higher-quality scans produce more accurate OCR and cleaner columns
- Turn off column splitting when you want each line preserved as a single field