Does anyone have any recommendation or procedures for repairing a corrupt PDF? When I open the file I get "There was an error opening this document. the file is damaged and cannot be repaired." There seems to be a myriad of tools out there but none that I could describe as reputable. Are there any opensource linux based solutions for this possibly?
-
Opensource PDF tools tend to be pretty crappy, I'm afraid. What are you using? – Satanicpuppy May 03 '11 at 14:38
-
Also see: http://superuser.com/questions/166999/rescuing-a-possibly-corrupt-pdf-in-acrobat – slhck May 03 '11 at 14:39
-
didnt like the look of any of the tools as they looked like the myriad of "Registry Cleaners" out there that are useless. Have been trying Adobe Pro and have just started looking if Ghostscript or PDFForge have any repair switches. – May 03 '11 at 14:48
-
Ghostscript is okay, but it's certainly not better than Acrobat. It's completely bare bones. – Satanicpuppy May 03 '11 at 18:41
-
10@Satanicpuppy I disagree :: I use ghostscript to rebuild damaged or low-quality pdfs quite often and it performs very well. – Edward J Beckett Feb 05 '13 at 20:16
-
I've used qpdf to repair forms that pdftk couldn't open. – Jon Hulka Sep 08 '13 at 17:26
6 Answers
Ghostscript will repair your corrupted PDF automatically... if it can open it in the first place (that is, if it is not damaged beyond repair). But afterwards you'll still need to double-check the result...
On Linux, try this command:
gs \
-o repaired.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
corrupted.pdf
On Windows, try this one:
gswin32c.exe ^
-o repaired.pdf ^
-sDEVICE=pdfwrite ^
-dPDFSETTINGS=/prepress ^
corrupted.pdf
- 12,411
- 2
- 54
- 71
-
4Ghostscript does a fantastic job of rendering pdfs ... I regularly use gs to rebuild pdfs to improve font quality. – Edward J Beckett Feb 05 '13 at 20:14
-
3The /prepress make the quality really good compared to /screen. Thanks. – Dolanor Sep 13 '15 at 22:17
-
1I get "An error occurred while reading an XREF table." What does that mean? – Geremia Jun 18 '19 at 15:26
-
It means the internal *table of contents* (what PDFs have to contain as *XREF* table) had an error, pointing to a wrong byte offset for a PDF object. Ghostscript very likely repaired that error and inserted a correct XREF table into the output. You can check this by running the output through Ghostscript one more time and see if this message still appears. – Kurt Pfeifle Jun 18 '19 at 18:13
-
The quality of the book cover (which is a color image) got much worse, any idea why? – Yan King Yin Feb 12 '21 at 08:53
-
@YanKingYin: No idea. To answer this question, I'd need the original file, the command you used, your version of GS and the resulting file. To analyse that in detail, I do not have the time currently, sorry. – Kurt Pfeifle Feb 12 '21 at 15:01
-
**Note:** According to [ghostcript v9.54 documentation](https://www.ghostscript.com/doc/9.54.0/VectorDevices.htm) it is better to use `-dPDFSETTINGS=/default` instead of `-dPDFSETTINGS=/prepress`. – Old Pro Mar 18 '22 at 20:48
I had a corrupted PDF file, print.pdf , that Ghostscript couldn't open, but the usual graphical Linux PDF viewers (Okular, Evince) opened fine. (In my case, the file had garbage at the start instead of a PDF header, when opened in a hex editor.)
These PDF viewers use Poppler as a back-end PDF renderer. So you can repair the PDF using Poppler's command-line tools. In Ubuntu these are in the poppler-utils package. I used:
pdftocairo -pdf print.pdf print_repaired.pdf
which generated a PDF file with correct headers, which tools like Ghostscript now accepted.
- 7,583
- 5
- 45
- 66
-
4+1 this read my Quartz generated PDF without complaints, and immediately started generating output. Ghostscript, Adobe Acrobat Pro and others insisted on rebuilding my 120GB pdf first. – Orwellophile Dec 14 '13 at 14:17
-
This didn't work for at least one weird PDF I came across, but it seems like a good start. – Brian Peterson Nov 11 '14 at 20:00
-
1Works perfectly on a PDF on which Ghostscript wanted to remove some arbitrary elements on pages. – Andrea Lazzarotto Nov 22 '14 at 16:14
-
Ghostscript failed to read the document but this worked like a charm. BTW I did this on Windows using the new linux subsystem, so cool! – HyLian Jun 05 '16 at 17:44
mutool (project page, manpage)
will repair broken PDFs without printing them.
- Installation e.g. on Ubuntu:
sudo apt-get install mupdf-tools - Run it like this:
mutool clean input.pdf output.pdf
mutool clean [options] input.pdf [output.pdf] [pages] The clean command pretty prints and rewrites the syntax of a PDF file. It can be used to repair broken files, expand compressed streams, filter out a range of pages, etc. If no output file is specified, it will write the cleaned PDF to "out.pdf" in the current directory.
Alternatively, there are a few tools and frameworks that can decompose/decompile PDFs into their components without rendering them. These could be useful for extracting text, scripts, and images. See this answer for a list of such tools: https://reverseengineering.stackexchange.com/q/1526/8210. E.g. you can try the current top answer Origami, it has a GTK-based viewer.
-
7This solution works "better" than the solutions offered above or better ranked, as it does not "print" the PDF file and keeps active the links, clickable items, etc... To me, it sounds a more elegant solution than using ghostscript or cairo. – xtr Jun 05 '15 at 15:21
-
2Unfortunately, `mutool clean` doesn't fix all possible errors. I have a file that has various errors in the font and content streams, and mutool will keep those errors. – Dominik Honnef Jun 09 '16 at 20:52
-
1@DominikHonnef You can always try tools/frameworks that decompose the PDF and allow you to view all the parts without rendering them. That should enable you to get text, scripts, images, etc. directly. See this answer for a list of tools: http://reverseengineering.stackexchange.com/q/1526/8210 – jmiserez Jun 24 '16 at 10:29
-
1
-
1This worked better since this does not render the pdf it examinate the document. – riccs_0x Oct 04 '17 at 00:28
-
-
BTW, if you have macOS you can install this with `brew install mupdf` (and then run `mutool clean input.pdf output.pdf` like the answer says) – Aaron Brager Nov 19 '20 at 02:30
I had a corrupted pdf file, because the php file used to download it echoed some errors (in HTML) and NUL characters at the end.
The solution was to open the pdf with Notepad++ and remove all text after the line
%%EOF
- 1,409
- 9
- 25
- 43
-
had same, Adobe Reader didn't open but native Mac, Chrome and Firefox PDF plugin displayed PDF file fine. Reason was also extra "NUL" at last line added during the upload. – Tilo Apr 08 '14 at 19:23
-
I had a PDF with two `%%EOF`. I deleted everything after the first `%%EOF` using a hex editor. Now everything works fine. – adjan Jun 17 '17 at 08:21
Since Chrome, Chromium and Firefox can open PDFs and can also print to PDF, that may work if they can render it correctly. That can be used too for modifying the format, number of pages, etc.
LibreOffice can also read and write PDF
GIMP can also read and write PDF, although it's not the most practical application when dealing with multi-page documents
Generally speaking if any of your installed applications can open the corrupt PDF file and you have a "Print to PDF" printer installed, you are good to go
- 1,522
- 1
- 17
- 35
There is Windows freeware tool PDF Fixer, which will run on Wine. I was able to get a preview of some content of a partially downloaded PDF, when the other tools mentioned here failed. But I was not able to combine it's output files to a valid PDF file (I had expected that it will produce one automatically, but that was not the case with my specific file).
- 101
- 2