Ideal for writing scripts to process thousands of files.
Converts PDF files to PostScript (PS) or Encapsulated PostScript (EPS). pdftohtml : Converts PDF files to HTML.
: Converts PDF layouts into functional HTML documents with clickable links.
Troubleshooting font embedding issues before printing. How to Install Xpdf-tools-win-4.04
Are you attempting to into a specific programming language (like Python or C#)? xpdf-tools-win-4.04
Level 1, Level 2, and Level 3 PostScript support.
pdfimages extracts images that look like static or noise. Solution: The original images were probably "flate" encoded vector illustrations. Use -png to force conversion to a viewable format, or accept that true vector data cannot be extracted as bitmaps.
| Issue | Workaround | |-------|-------------| | No Unicode output in text | Try -enc UTF-8 | | Non-Western text garbled | Use -enc with appropriate encoding | | No PDF creation / editing | Not a goal of Xpdf | | Scanned PDFs (image only) | Need OCR first (Xpdf can’t OCR) | | Some complex layouts | -layout may still fail; use pdftohtml instead |
: You can use Command Prompt's for loop to process all PDFs in a folder at once. This command runs pdftotext on every .pdf file in the current directory: Ideal for writing scripts to process thousands of files
echo test > test.txt (print to PDF if you have a virtual printer) or use a sample PDF. pdftotext sample.pdf sample.txt type sample.txt
Converts PDF files to PostScript ( .ps ) or Encapsulated PostScript ( .eps ) format.
Open Command Prompt or PowerShell and type pdftotext -v . You should see version 4.04 information. Common Usage Examples 1. Extracting Text ( pdftotext )
For decades, the name has been synonymous with fast, reliable, and no-nonsense PDF processing. While the PDF world has grown crowded with bloated readers and subscription-based editors, the core Xpdf suite has remained a loyal companion for system administrators, developers, and power users. : Converts PDF layouts into functional HTML documents
While older versions of XpdfTools are still floating around the web, upgrading to or starting with version 4.04 is highly recommended due to the following structural improvements:
The package is a suite of command-line utilities designed for manipulating and extracting data from PDF files on Windows. While it has been succeeded by version 4.06 (released in November 2025), version 4.04 remains a popular choice for specific data automation tasks. What Makes It Useful?
| Tool | Purpose | |------|---------| | pdftotext | Extract raw text from PDFs | | pdftohtml | Convert PDF to HTML | | pdfimages | Extract images (JPEG/PNG) | | pdftopng | Convert PDF pages to PNG | | pdfinfo | Show metadata, page count, size | | pdffonts | List fonts used in a PDF | | pdfdetach | Extract embedded attachments |
Turning a PDF document into a series of image files for web viewing.
With the release of , the project continues its tradition of delivering a purely command-line toolset for manipulating PDF files on Windows systems. Here is everything you need to know about this update.
To run these commands from any Command Prompt or PowerShell window without typing the full directory path, add the folder to your system environment variables: