2

Question: How do I get Adobe Acrobat Pro DC to export all PDF files in a folder as text files ?

Using the Action Wizard on the Tools menu of Adobe Acrobat Pro DC I was able to create a custom command which allowed me to export (OCR) thousand of images as pdf files. I now want to export those images; which are now searchable as text files. However I cannot seem to find a similar set of tools to do this.

Note: There is an export button that allows me to export files one-at-a-time as a text file but I cannot seem to find something that will allow me to run a command on entire folder.

EDIT: I called customer support and a possible work around is to combine all the files into one giant PDF file and then export the pdf file. On the other hand I need a separate ID for each pdf file exported as text so that is not an option.

user3195446
  • 153
  • 1
  • 7

1 Answers1

1

You may use PowerShell combined with Xpdf.

Xpdf will install a program called pdftotext, which can be invoked from a PowerShell script such as:

$FILES= ls *.pdf
foreach ($f in $FILES) {
    & "C:\Program Files\xpdf\bin32\pdftotext.exe" -enc UTF-8 "$f"
}

A similar batch script can be invoked from a .bat file without using PowerShell:

for /f %%G in ('dir /b') do {
  "C:\Program Files\xpdf\bin32\pdftotext.exe" -enc UTF-8 "%%G"
)

(Note: None of the scripts was tested.)

harrymc
  • 455,459
  • 31
  • 526
  • 924
  • I am assuming I should be installing [this](http://www.xpdfreader.com/download.html) e.g Xpdf/Xpdreader and the Xpdf command lines tools?. @harrymc Do you think I can use a PowerShell script to simply save the files as .text ? – user3195446 Mar 26 '19 at 21:32
  • 1
    Yes, I assume the same. I don't know of any support for PDF in pure PowerShell. – harrymc Mar 27 '19 at 07:22
  • Curiously I am getting an unexpected token error: -enc UTF-8 "$f" – user3195446 Mar 27 '19 at 12:41
  • See [here](https://superuser.com/questions/1418195/powershell-unexpected-token-error/1418255#1418255) for a correction to your solution. I accepted your answer , you were missing an ampersand. – user3195446 Mar 27 '19 at 16:50
  • 1
    I did say it was untested... Sorry for the problem. – harrymc Mar 27 '19 at 16:51
  • solid answer and good starting point - whence me excepting your answer. Around 10,000 PDF files ranging from 7 kb to over 40,000 kb and it only took a few seconds. – user3195446 Mar 27 '19 at 16:57