UltraConvert
Ferramentas de PDF

Invoice Reader — PDF to JSON, Markdown, or Text

Turn a PDF invoice into structured data you can import into accounting software, spreadsheets, or your own apps. The Invoice Reader extracts vendor and customer details, invoice number and dates, line items with quantities and amounts, and tax totals — then exports everything as JSON, Markdown, or plain text. Indian GST invoices are supported: GSTIN, HSN/SAC codes, and CGST, SGST, or IGST breakdowns are detected when present. Scanned or image-only invoices are recognized with built-in OCR, and your file never leaves your device.

O que esta ferramenta faz?

The Invoice Reader analyzes the text and layout of a PDF invoice and reconstructs it as structured data. It scans header regions for labeled fields like invoice number, date, and due date; detects the line-item table by matching column headers such as Description, Qty, Rate, Amount, and HSN; and reads the totals block for subtotal, tax lines, and grand total. For Indian tax invoices it recognizes GSTIN numbers, HSN/SAC columns, and separate CGST, SGST, or IGST amounts. Fillable PDF invoices are handled too — AcroForm field values are merged with extracted text. When the layout cannot be parsed confidently, the tool falls back to raw page text so you still get usable output.

Como funciona

The PDF is opened in MuPDF and text is extracted with position data for every word. Pages with no embedded text are rasterized and run through on-device OCR automatically. The parser scans the top portion of the first page for label:value pairs and GSTIN/PAN identifiers, identifies the vendor from the largest text in the header area, and locates the customer block near Bill To labels. Line items are reconstructed using bounding-box table detection: column headers are matched against invoice keywords, and each row's words are assigned to columns by horizontal position. Totals are read from the bottom region of the last page. The structured result is serialized to JSON (full schema), Markdown (human-readable tables), or plain text (flattened summary). All processing runs locally in your browser.

Recursos

Como usar

  1. 1

    Upload your invoice

    Drag a PDF invoice onto the drop zone. Text-based invoices are read directly; scanned pages are OCR'd automatically.

  2. 2

    Choose output format

    Select JSON for programmatic use, Markdown for readable tables, or plain text for a simple summary.

  3. 3

    Review the preview

    The preview shows a sample of the extracted data. Check that invoice number, line items, and totals look correct.

  4. 4

    Download the file

    Click Read Invoice to generate the output. Download the .json, .md, or .txt file for import into your workflow.

Casos de uso comuns

Accounting and bookkeeping import

Extract invoice fields as JSON for import into accounting software, ERP systems, or custom expense-tracking apps.

GST compliance and record keeping

Pull GSTIN, HSN codes, and tax breakdowns from Indian tax invoices for reconciliation and audit trails.

Batch processing preparation

Convert invoices to a consistent JSON schema before feeding them into automated workflows or data pipelines.

Scanned invoice digitization

OCR scanned or photographed invoices and get structured line items and totals instead of raw text dumps.

Dicas e boas práticas

Perguntas frequentes

How is this different from Extract Text?
Extract Text dumps raw text, blocks, and form widgets without invoice semantics. Invoice Reader parses vendor, line items, totals, and GST fields into a structured invoice schema.
Does it work on scanned invoices?
Yes. Image-only pages are recognized with built-in OCR before parsing. Recognition quality depends on scan clarity.
Which invoice formats are supported?
The parser uses universal heuristics for common layouts plus India GST-specific field detection. It does not depend on a single vendor template — QuickBooks, Zoho, Tally, and generic invoices are handled when they follow typical structures.
Is it really free and private?
Yes. Extraction, OCR, and parsing all run in your browser. There are no uploads, subscriptions, or usage limits.

Ferramentas relacionadas