0

I try selecting text with mouse from this Slovak document: https://fphil.uniba.sk/fileadmin/fif/katedry_pracoviska/sas/Publikacie/Foneticka_prirucka.pdf .

In Browser (Chromium) and in Okular I have weird characters in selection.

When I extract text in Okular from this document, I have also unrecognized characters, in a different way.

EDIT: I have found this library/tool: https://pypi.org/project/multilingual-pdf2text/ that probably would help me, but I don't know how to use it.

Is the way extract text from this document with correctly recognized characters?

user545
  • 1
  • 1
  • You are not indicating which text to select so other users could reproduce the issue. I see no issues copy/pasting text using Evince but I may be testing the wrong spot. – vanadium Jun 26 '23 at 06:45
  • I can confirm with Chromium: no characters are recognized. With Firefox, it goes equally well as with Evince, but now I see your problem where some characters, like Ľ and Ž, are not properly recognized. – vanadium Jun 26 '23 at 07:11
  • I'm concerned about the misrecognition of letters "Ľ" and "Ž", for example, in the name Ľudmila Žigová on the front page. I figured out how the program multilingual-pdf2text works. It gives good results. I'm waiting for my friend's confirmation of results. – user545 Jun 26 '23 at 12:55
  • The result is mediocre, as my friend told me when he looked at the generated text file. – user545 Jun 26 '23 at 17:00

0 Answers0