I have recently scanned a book into a 600 page PDF file. However the pages are randomly skewed/rotated clockwise or counterclockwise. Any software to automatically correct this ? I know Acrobat Pro can, but any free Ubuntu software / script ?
2 Answers
Deskew
Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.
Installation: Download last release. Seems well maintained.
pagetools: Page Layout Detection Tools
Automatic deskew and bounding box determination for scanned page images
sudo apt install pagetools
Last Update: 2013-03-22
- 14,308
- 4
- 74
- 117
-
unfortunately neither takes pdf natively as input file type, so involves splitting and reassembling – joseph_morris Mar 01 '22 at 20:10
Do you mean skewed—as in, stretched in some way, like this:
—or rotated?
I'm assuming you mean rotated, since I honestly don't think it's possible for your scanner to mess the image up that badly!
If you just need to rotate, I would recommend PDF-Shuffler, a GUI-based program that can make the process of going through each page and rotating them as necessary a lot less painful. Have a look. And I'm sure there are other programs that could do the same thing.
Unfortunately, I don't know of any software that can look over all the pages in your PDF and decide for you which ones need to be transformed in some complex way, let alone rotated.
EDIT: If your file was a native pdf that could be converted into postscript (.ps) format, I think it's possible there is a way to autorotate pages using ghostscript. However, to my knowledge, you can't do this with scanned pages, because the auto-rotate feature relies on interpretation of text direction, which can only come from a native pdf or ps document. I'm not completely sure...I will look into this a little more.
- 876
- 10
- 21
-
Should actually be a fairly solvable problem. Text typically has a right margin, which is mostly straight, with few figures outside this. – vidarlo Apr 27 '18 at 15:03
-
@vidarlo, that's a good point. Sadly, I don't know how to write that script myself :(. It makes me think, though, that it if the asker didn't mind exporting all the pages to individual files, s/he could easily use a preexisting GIMP script to autorotate each page, then merge the pages back into one single pdf with pdftk or even with GIMP's export as mng, then command line convert mng to pdf. – Hee Jin Apr 27 '18 at 15:37
