How can I find the differences in visually identical PDF files?

Question

I have two PDF files that look the same when viewed or printed, which diff says are binary and differ. How can I find out what the differences are?

I prefer an answer that will run on Mac OS X or on OpenSUSE.

Since you mention `diff` I assume you are using some kind of Unix? — terdon, Aug 28 '13 at 17:18
Mac OS X, with MacPorts. I found the `exiftool` part of my answer while I was composing the question, thought I might add that detail in my answer, and then got the message that I can't answer my own question for 8 hours. — ShadSterling, Aug 30 '13 at 23:22
I can use any answer that will run on Mac OS X or on OpenSUSE, but in the spirit of making answers that are useful for everyone I would welcome answers that only work on other systems. — ShadSterling, Aug 30 '13 at 23:27
That question appears to be about visible differences, not invisible differences, and asks for a platform-specific solution. — ShadSterling, Sep 02 '13 at 00:32

score 0 · Answer 1 · answered Aug 28 '13 at 17:26

0

For starters, I would try strings on those files and pipe it through grep "rdf" to see what comes up.

strings x.pdf | grep "rdf"

answered Aug 28 '13 at 17:26

Stefan Ludwig

101
1

Please explain why you would do this. The answer you gave is kind of for insiders only. Check out [answer]. – user 99572 is fine Aug 28 '13 at 19:24
I'm not sure I follow. @Polyergic asked for ways to find differences. If it's something like creation or modification dates embedded in the PDF (rdf metadata), the commands above would find that. Without more context about what we need to find out, it's difficult to give a more specific answer. And I assume that using Acrobat is out of the question. – Stefan Ludwig Aug 28 '13 at 22:00
`strings` doesn't find differences, it extracts strings. I'd upvote this because it's useful, but apparently my reputation is too low to acknowledge useful-but-incomplete answers on my own question. – ShadSterling Aug 30 '13 at 23:18
It would be nice to know that it can be done with Acrobat, but I would not be able to make use of that answer. – ShadSterling Aug 30 '13 at 23:20

score 0 · Accepted Answer · edited Mar 20 '17 at 10:17

In shells that support Process Substutition (seen in Q317819), diff can be given the output of any command that generates a text representation - for example, exiftool:

diff -u <(exiftool -a -v one.pdf) <(exiftool -a -v two.pdf)

Stefan's suggestion of strings also generates a text representation, and can be used the same way:

diff -u <(strings one.pdf) <(strings two.pdf)

The output from exiftool or strings is relatively readable, but does not represent the entire file. exiftool only shows metadata, and strings only shows excerpts which are 4 or more bytes of valid ASCII text; differences which are neither recognized as metadata or as ASCII strings will not be found. An unreadable but complete text representation can be made with od:

diff -u <(od -vcw one.pdf) <(od -vcw two.pdf)

(If od is not available, an even less readable but still complete text representation can be made with hexdump or hexcat; in MacPorts, the GNU implementation of od my be installed as god. Not all implementations support the same options.)

The apparent best-available method to see all differences and see the meaning of as many differences as possible is to use each of these on the same two files.

I believe every modern desktop OS other than Windows has a shell that supports Process Substitution installed by default; several such shells are available for Windows, but you'll have to jump through some hoops to get them working.

How can I find the differences in visually identical PDF files?

2 Answers2