I have two PDF files that look the same when viewed or printed, which diff says are binary and differ. How can I find out what the differences are?
I prefer an answer that will run on Mac OS X or on OpenSUSE.
I have two PDF files that look the same when viewed or printed, which diff says are binary and differ. How can I find out what the differences are?
I prefer an answer that will run on Mac OS X or on OpenSUSE.
For starters, I would try strings on those files and pipe it through grep "rdf" to see what comes up.
strings x.pdf | grep "rdf"
In shells that support Process Substutition (seen in Q317819), diff can be given the output of any command that generates a text representation - for example, exiftool:
diff -u <(exiftool -a -v one.pdf) <(exiftool -a -v two.pdf)
Stefan's suggestion of strings also generates a text representation, and can be used the same way:
diff -u <(strings one.pdf) <(strings two.pdf)
The output from exiftool or strings is relatively readable, but does not represent the entire file. exiftool only shows metadata, and strings only shows excerpts which are 4 or more bytes of valid ASCII text; differences which are neither recognized as metadata or as ASCII strings will not be found. An unreadable but complete text representation can be made with od:
diff -u <(od -vcw one.pdf) <(od -vcw two.pdf)
(If od is not available, an even less readable but still complete text representation can be made with hexdump or hexcat; in MacPorts, the GNU implementation of od my be installed as god. Not all implementations support the same options.)
The apparent best-available method to see all differences and see the meaning of as many differences as possible is to use each of these on the same two files.
I believe every modern desktop OS other than Windows has a shell that supports Process Substitution installed by default; several such shells are available for Windows, but you'll have to jump through some hoops to get them working.