13

I have a djvu file in which I can search for specific words. However, If I convert it to pdf (I tried with cutePDF and with the online djvu-pdf tool) the possibility to search for words seems to be lost.

How can I convert a djvu file to a pdf preserving word searchability?

glS
  • 356
  • 1
  • 4
  • 13

7 Answers7

13

I wrote a script to do this a long time ago. It is essentially glue code around a few utilities that do the heavy lifting. The difference between my script and the other tools at the time is that mine was the only one that did all of the following:

  • had a similar compression ratio to the original DjVu file (1.5-2x the size instead of 10-20x the size)
  • preserved bookmarks / table of contents metadata (for navigation in the pdf reader)
  • preserved the embedded text layer for searching

That being said, it is very primitive. I just made sure it worked well for all of my own files and haven't worked on it since.

vindvaki
  • 231
  • 2
  • 3
  • Thank you for your script ! The .pdf converted works very well on Windows. However, Acrobat Reader on iOS or on Android cannot read the .pdf converted. Could you suggest which part could be improved to be compatible with Acrobat Reader on iOS or on Android ? – SOUser Dec 26 '20 at 18:17
  • Works also on macOS when replacing BSD `readlink` by GNU `greadlink` which available via [GNU coreutuils](https://formulae.brew.sh/formula/coreutils). – Stefan Schmidt Aug 31 '23 at 11:21
6

I packed vindvaki's scripts into docker image with required dependencies. You can try it with:

  docker run --rm -u $(id -u):$(id -g) -v $(pwd):/opt/work ilyabystrov/djvu2pdf filename.djvu filename.pdf

Check djvu2pdf-docker for details.

login
  • 161
  • 1
  • 2
3

This DjVu to PDF converter definitely preserves word searchability in case the original DjVu is searchable. It also produces smaller output files than calibre.

Marc Aurel
  • 39
  • 1
  • 2
    Welcome to Super User! Thanks for your contribution. Please read [How do I recommend software in my answers?](//meta.superuser.com/a/5330) paying particular attention to the items in `bold`. After doing so, please [edit your answer](//superuser.com/posts/1348160/edit) so it follows the guidelines. – robinCTS Aug 11 '18 at 13:19
2

Open the PDF file in PDF-XChange Viewer and perform OCR (I believe only four languages are supported). It takes time but it is damn good (even on two-column documents).

On Mac and Linux you will need Wine.

Marduk
  • 278
  • 2
  • 6
1

Best converter site I have found

https://www.pdf2go.com/ + ocr option

srghma
  • 180
  • 1
  • 7
1

Have you tried Calibre? The contributor to Calibre mentions that OCR'd text in djvu is supported. So it could probably be converted to PDF with searchable text.

beatcracker
  • 2,642
  • 13
  • 18
0

All of these answers suggest just doing the OCR again!?

The best tool for the job (IMHO) is a freeware app called Djvutoy form: https://www.mediafire.com/folder/oajr60vu7zcls/MJ_Stronghorse_Apps

M2G
  • 1
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 08 '22 at 21:12
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/1144904) – Rohit Gupta Sep 08 '22 at 21:51