Questions tagged [ocr]

Optical character recognition (OCR) is the process of converting images of text to text that can be manipulated by word processors etc.

Optical character recognition (OCR) is the process of converting images of written or printed text into a standard text format.

It is used when scanning paper documents or books to create a searchable text representation.

Similar technologies include

186 questions
48
votes
11 answers

How to extract text with OCR from a PDF on Linux?

How do I extract text from a PDF that wasn't built with an index? It's all text, but I can't search or select anything. I'm running Kubuntu, and Okular doesn't have this feature.
agentofuser
  • 7,247
  • 11
  • 38
  • 34
34
votes
4 answers

How to create PDF with scanned pages but selectable text?

Today I recieved a PDF from our supplier and it contained several printed and scanned pages with signatures etc. I opened it in Acrobat Reader DC. But to my surprise the text from the evidently scanned images could be selected and copied as a text.…
Vojtěch Dohnal
  • 3,740
  • 9
  • 25
  • 50
24
votes
6 answers

Batch-OCR many PDFs

This has been discussed a year ago here: Batch OCR for many PDF files (not already OCRed)? Is there any way to batch OCR PDFs that haven't been already OCRed? This is, I think, the current state of things dealing with two issues: Batch OCR…
Joe
  • 442
  • 1
  • 5
  • 11
23
votes
3 answers

Blurry text in PDF

I have a pdf that has blurry text. The text itself is readable but causes lots of strain. This is an example of the text. Is there a way to clear it up?
user1255895
  • 241
  • 1
  • 2
  • 4
20
votes
11 answers

How to remove OCR from a PDF?

I have been searching Google for some time but cannot find an answer to my question. I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but…
Sanoo
  • 495
  • 2
  • 9
  • 22
19
votes
8 answers

How can I convert scanned images as PDF to a searchable PDF file?

I have a PDF of a scanned book. I'm looking for a free software that will perform OCR and then provide an option to save it as a PDF or document again. Is there one?
yuval
15
votes
3 answers

How can I identify fonts from an image?

Many times I come across bitmaps with nothing but text paragraphs, so I was looking for a way to identify the font used, the paragraph alignment, line spacing and color, bold, italics. Would an OCR package allow me to do that? If not, what other…
Robin Rodricks
  • 2,432
  • 6
  • 38
  • 50
13
votes
8 answers

Practical OCR solution for converting a large book to a digital format?

I was over by my grandparent's place this past weekend. My grandmother pulled out this giant (~1400 page) book of her family history going back to 1630 or so. Giant nerd that I am, I thought it would be slick to have all the information stored in a…
user11219
13
votes
6 answers

Extract OCR text from Evernote

Evernote does OCR on the images you save to it. Is there a way to get the full text equivalent for an image in Evernote, or is the OCR only for searching?
Leigh Riffel
  • 1,846
  • 4
  • 21
  • 26
13
votes
7 answers

Enable OCR in Greenshot

I run Windows 10 with Microsoft Office professional Plus 2016 on my computer. Looks like MS OCR functionality is enabled in my system since OneNote is able to copy text from image. But how to enable this functionality for Greenshot? Currently I…
vico
  • 2,379
  • 12
  • 39
  • 60
11
votes
5 answers

PDF has an extra blank in all words after running through Ghostscript

This PDF was produced by Abbyy Finereader 10: http://ebooks.zeitr.org/from_abbyy.pdf You can copy & paste the first sentence and get this (very good) text result: Der »Bund Deutscher Gymnastik-Schulleiter« wurde am 20. November 1955 anläßlich einer…
Erwin Jurschitza
10
votes
4 answers

Batch OCR for many PDF files (not already OCRed)?

I use Google Desktop Search (I am on Vista) and not all my PDF files are recognized in my archive folder. It is normal as "PDF files that contain scanned images" are not indexed ( http://desktop.google.com/support/bin/answer.py?hl=en&answer=90651…
Erb
  • 405
  • 2
  • 6
  • 16
9
votes
3 answers

Can Acrobat 11 be made to do OCR using multiple CPU cores?

OCR processing takes time. Using multiple CPU cores would speed up processing. Acrobat 10 was not a multithreaded application. How about Acrobat 11? Does 11 by default do OCR using multiple CPU cores (if available)? If not, are there any…
tarcman.
  • 151
  • 1
  • 4
9
votes
3 answers

Good free OCR with GUI for correcting mistakes? (for Windows)

I've used SimpleOCR, which has a nice GUI for correcting mistakes. Unfortunately it makes a lot of mistakes! (and suffers other bugs and limitations) On the other hand Tesseract is more accurate but has no GUI at all. My question is, is there a free…
Hugh Allen
  • 9,820
  • 6
  • 34
  • 42
8
votes
3 answers

How do I Initiate an OCR scan in Microsoft Office Word 2010?

How do you kick off a scan and character recognition using Microsoft Office Word 2010 (Beta)? I can't seem to find an option to scan the document in my scanner directly into a 2010 Word document. I have checked the installation settings for Office…
grenade
  • 527
  • 1
  • 6
  • 18
1
2 3
12 13