2

Possible Duplicate:
Extracting text from a .PDF scanned book
How to do OCR on a PDF document?

I've got a >200 page pdf manual that was produced by scanning hard copy. I'd like to convert it to a searchable text format, but am not having any success finding a tool to do so. Google's search results are highly polluted with crippleware trial software that can only do the first few pages of the file. The only truly free application I found, FreeOCR's pdf renderer fails to handle anything beyond the first few pages of the file.

Google's pdf viewer does OCR; but doesn't appear to provide any export option other than copy/paste; in addition to being very tedious, what it puts on the clipboard is only plaintext; which means I'd lose all of the line art and significant formatting due to horizontal placement.

  • @DanielAndersson Unfortunately, none of those were helpful. Blowing the file apart into hundreds of image files and then gluing them back together would be a massive waste of time (1st and 3rd link). I've already got plenty of tools that claim they'd do the job if I gave them money, but which I can't verify the claims of because the problematic parts of the file are beyond what they'd do for free (2nd link) – Dan Is Fiddling By Firelight May 20 '12 at 17:46
  • Then put that info in your question as well so people know what you have tried and not. People aren't at this site because they like guessing :-) – Daniel Andersson May 20 '12 at 19:05

1 Answers1

2

If you upload your PDF to Google Drive (Docs) and have your upload conversion settings to convert images to text and then convert the document to a Google Doc (this can all be done at upload). You should then be able to open the doc, click file > download as and select the format you want?

I just did this is a magazine page and it worked okay, not all of the fonts were recognised though.

sgtbeano
  • 571
  • 3
  • 13