Selection with mouse from PDF produces weird characters

Asked Jun 26 '23 at 03:25

Active Jun 26 '23 at 03:57

Viewed 28 times

In Browser (Chromium) and in Okular I have weird characters in selection.

When I extract text in Okular from this document, I have also unrecognized characters, in a different way.

EDIT: I have found this library/tool: https://pypi.org/project/multilingual-pdf2text/ that probably would help me, but I don't know how to use it.

Is the way extract text from this document with correctly recognized characters?

edited Jun 26 '23 at 03:57

asked Jun 26 '23 at 03:25

user545

You are not indicating which text to select so other users could reproduce the issue. I see no issues copy/pasting text using Evince but I may be testing the wrong spot. – vanadium Jun 26 '23 at 06:45
I can confirm with Chromium: no characters are recognized. With Firefox, it goes equally well as with Evince, but now I see your problem where some characters, like Ľ and Ž, are not properly recognized. – vanadium Jun 26 '23 at 07:11
I'm concerned about the misrecognition of letters "Ľ" and "Ž", for example, in the name Ľudmila Žigová on the front page. I figured out how the program multilingual-pdf2text works. It gives good results. I'm waiting for my friend's confirmation of results. – user545 Jun 26 '23 at 12:55
The result is mediocre, as my friend told me when he looked at the generated text file. – user545 Jun 26 '23 at 17:00

0 Answers0