6

Possible Duplicate:
How to extract text with OCR from a PDF on Linux?

I have a few documents in English and Hebrew that I scanned in and converted to PDF format.

Is there some free or cheap utility that can process a scanned PDF and do OCR, at least in English, preferably also in Hebrew?

Thanks!

Shaul Behr
  • 1,457
  • 6
  • 20
  • 38
  • A couple of similar questions. http://superuser.com/questions/28426/how-to-extract-text-with-ocr-from-a-pdf-on-linux/33203#33203 http://superuser.com/questions/64124/extracting-text-from-a-pdf-scanned-book http://superuser.com/questions/97470/scan-a4-doc-pdf-ocr-translate-to-english – heavyd Feb 16 '10 at 16:47
  • 6
    The author of this question did not specify that he is running Linux. The so-called possible duplicate question is too localized, and may not apply at all to the author of this question. – eleven81 Feb 16 '10 at 17:03
  • 3
    @eleven81 - Correct, I was asking for Windows. – Shaul Behr Jul 04 '10 at 08:34
  • Not only this is not duplicate - it's still unanswered. All 3 answers only yields into text extracts and not a PDF text-selectable document. – cregox Jun 28 '13 at 16:05

3 Answers3

1

I found an interesting idea that lets Google do all the work of OCR'ing the PDF files for you.

eleven81
  • 15,376
  • 15
  • 55
  • 83
1

I found a list of free OCR software for Windows.

  1. FreeOCR
  2. Tesseract
  3. WeOcr Tesseract Web Interface
  4. GOCR
  5. Windows GUI for GOCR
  6. OCR Desktop
  7. Simple OCR
  8. TopOCR

However, these programs need an image input, not a PDF input. For this, try a PDF-to-JPG converter.

eleven81
  • 15,376
  • 15
  • 55
  • 83
0

Personally, I would use Ghostview to convert them to an image, then Tesseract to convert them to text. This is a totally free, open source, cross platform solution that I have had very good results with when trying to convert plain text. I don't use it for complex documents with tables and such, but for plain text you can't beat the price.

Dennis
  • 6,578
  • 1
  • 28
  • 28