Tesseract with "fl" and "fi" characters

Question

I started using tesseract yesterday. It worked very well, but apparently my original text (in the scanned image) had characters that combine fi into one, single character and fl into another single character. And tesseract converts those into special characters. How can I tell it to generate "f i" or "f l" instead?

This question does not seem to be about Ubuntu or anyone of its flavors. Questions about programming should be asked in [StackOverflow](https://stackoverflow.com/). May be you should apply some image processing before you pass the image to Tessract. — singrium, Jun 26 '20 at 14:09
Perhaps this will be helpful? [Ligatures in Tesseract OCR Output](https://mlichtenberg.wordpress.com/2015/09/11/ligatures-in-tesseract-ocr-output/) — steeldriver, Jun 26 '20 at 14:10
@singrium Questions about using software on Ubuntu are on topic, ([to the extent that *about using software* is a reasonable description](https://chat.stackexchange.com/transcript/201?m=51223939#51223939), which applies here). — Zanna, Jun 27 '20 at 01:16
@Zanna I agree with that. However, based on the description of the problem, it does not seem to have a relation to Ubuntu, it is more related to image processing techniques and how to make the characters clear and readable (reduce the noise in the image). Hence, it is more related to *programming* than to *Ubuntu*. — singrium, Jun 27 '20 at 01:34
OK. I am an Ubuntu user, but I understand that the question might belong elsewhere. I am asking about using tesseract from the command line. Is that a programming problem that belongs in Stackoverflow? Or is there an even better place for the question? — Georgie, Jun 28 '20 at 14:53

Tesseract with "fl" and "fi" characters

0 Answers0