2

I have a relatively simple graphic; a technical diagram with various bits of text throughout.

The text is in another language (Czech/Slovak). It's printed clearly enough to be accurately distinguished and interpreted by (Android) Google Translate's OCR. Anyway, I'd like to replace the text with the English translation.

In this instance, it wouldn't be difficult to just cut/obscure the current text, and then add all new text boxes with basic editing tools. But it would be nice if I could just simply select and edit the text in the same way that some PDF editors are able.

I tried exporting the JPEG/PNG as a PDF, and then editing it with Libre Office. But it was no different to creating a blank document and inserting the image; it was all just a single object.

I understand why, and I don't expect magic. But does anyone have a technique for this?

voices
  • 2,651
  • 8
  • 29
  • 47
  • Basically, are you asking if there's a graphic editing program that kinda does OCR to recognize textual elements in a graphic, and converts them to actual, editable text? If so, there's none that I know of. Text is not just text, there's fonts, kerning, line-spacing, and all sorts of other variables involved. Sounds like it would be a really cool feature, though! – Sandwich Dec 22 '17 at 10:13
  • @Sandwich Most PDF editors that I've used have run into font compatibility issues at some stage. Especially when editing documents from foreign operating systems. It's never caused me any grief though; I've always managed to convert the text to a native font. – voices Dec 23 '17 at 09:15
  • PDFs are far more "rich" than JPEG, though. PDFs are intended to allow high-quality printing from a portable document. As such, not only do they often contain embedded JPEGs for imagery, but they also can store vector data for graphics, as well as textual data with the corresponding formatting directives and the actual font itself—all hidden within the PDF file itself. This is why many PDFs have copyable and searchable text. OCR is what we have to resort to when there IS no underlying textual data—it tries to figure out letters from pure pixel data. – Sandwich Dec 23 '17 at 10:59

1 Answers1

2

The most straightforward way to do this (without looking at how complex the document is) is to use DTP software to place the image, lock the image and then add individual text boxes which have a white background. You will be able to have very precise placement and you can simply size the box to cover the old text. You can then export as PDF or a flat image format.

Scribus is a free open source DTP package along the lines of Quark or InDesign.

Word etc. often has a facility with text boxes, but I find these to be cumbersome as compared to DTP software.

If you clean up the image well enough, you could try using Inkscape (open source) or Illustrator to "Trace" or "Live Trace" the image, and the text might wind up being editable as non-text vector objects, but that is probably going to be problematic, and even then will take longer than placing new text boxes.

Yorik
  • 4,166
  • 1
  • 11
  • 16