106

On Linux, how could I generate a diff between two XML files?

Ideally, I would like to be able configure it to some things strict, or loosen some things, like whitespace, or attribute order.

I'll often care that the files are functionally the same, but diff by itself, would be annoying to use, especially if the XML file doesn't have a lot of linebreaks.

For example, the following should really be okay to me:

<tag att1="one" att2="two">
  content
</tag>

<tag att2="two" att1="one">
  content
</tag>
slhck
  • 223,558
  • 70
  • 607
  • 592
qedi
  • 1,581
  • 2
  • 11
  • 10

10 Answers10

120

One approach would be to first turn both XML files into Canonical XML, and compare the results using diff. For example, xmllint can be used to canonicalize XML.

$ xmllint --c14n one.xml > 1.xml
$ xmllint --c14n two.xml > 2.xml
$ diff 1.xml 2.xml

Or as a one-liner.

$ diff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)
Stephen Rauch
  • 3,091
  • 10
  • 23
  • 26
Jukka Matilainen
  • 2,792
  • 1
  • 18
  • 9
  • 3
    Never knew about the --c14n switch in xmllint. That's handy. – qedi Dec 10 '09 at 20:21
  • 19
    You can do it in one line too `vimdiff <(xmllint --c14n one.xml) <(xmllint --c14n two.xml)` – Nathan Villaescusa Mar 03 '13 at 01:53
  • 1
    and xmllint ships with OS X – ClintM Sep 20 '16 at 16:17
  • 13
    In case it wasn't obvious, c14n is an abbreviation for _canonicalization_. – Brandin Nov 08 '16 at 17:53
  • 6
    It is better to execute an additional step before diff - formatting of both XMLs (xmllint --format). Because I've noticed that without this step diff shows more differences than necessary. – ka3ak Dec 09 '16 at 12:07
  • 2
    another run with `xmllint --format` may be helpful (see other answers) – törzsmókus Jun 03 '19 at 06:43
  • I have machine-output xml files to compare manually and tried `--format` to decrease my effort but it actually had trouble with indenting and increased the output size. This is libxml2-utils version 2.9.4+dfsg1-7+deb10u2 amd64 - maybe newer versions don't trip on this. – Bill McGonigle Jan 20 '22 at 16:46
  • 1
    I also needed `XMLLINT_INDENT=" " xmllint --format ...` because --c14n does NOT unify tabs versus spaces at the beginning of the lines, thus the files where still different. – masterxilo Mar 08 '22 at 14:33
34

Jukka's answer did not work for me, but it did point to Canonical XML. Neither --c14n nor --c14n11 sorted the attributes, but i did find the --exc-c14n switch did sort the attributes. --exc-c14n is not listed in the man page, but described on the command line as "W3C exclusive canonical format".

$ xmllint --exc-c14n one.xml > 1.xml
$ xmllint --exc-c14n two.xml > 2.xml
$ diff 1.xml 2.xml

$ xmllint | grep c14
    --c14n : save in W3C canonical format v1.0 (with comments)
    --c14n11 : save in W3C canonical format v1.1 (with comments)
    --exc-c14n : save in W3C exclusive canonical format (with comments)

$ rpm -qf /usr/bin/xmllint
libxml2-2.7.6-14.el6.x86_64
libxml2-2.7.6-14.el6.i686

$ cat /etc/system-release
CentOS release 6.5 (Final)

Warning --exc-c14n strips out the xml header whereas the --c14n prepends the xml header if not there.

rjt
  • 1,016
  • 2
  • 14
  • 17
23

Tried to use @Jukka Matilainen's answer but had problems with white-space (one of the files was a huge one-liner). Using --format helps to skip white-space differences.

xmllint --format one.xml > 1.xml  
xmllint --format two.xml > 2.xml  
diff 1.xml 2.xml  

Note: Use vimdiff command for side-by-side comparison of the xmls.

GuruM
  • 331
  • 2
  • 6
  • In my case `two.xml` was generated from `one.xml` by a script. So I just needed to check what was added/removed by the script. – GuruM Aug 08 '12 at 10:36
  • 2
    This was the option I needed. Supposedly the most canonical version can be obtained by combining `--format` with `--exc-c14n`; will probably be still slower to process :( – ᴠɪɴᴄᴇɴᴛ Nov 27 '14 at 14:05
  • It's been quite some time since I wrote the answer, but I faintly remember using the --exc-c14n flag. However, diff-ing the output with/without the flag showed no differences so just stopped using it. Dropping unnecessary/unused flags might make the process faster. – GuruM Dec 21 '14 at 06:49
  • 7
    The `--exc-c14n` option specifies sorting of the attributes. In your specific files the attributes probably were already sorted, but the general advice would be to use the combination `--format --exc-c14n`. – ᴠɪɴᴄᴇɴᴛ Dec 22 '14 at 14:33
8

If you wish to also ignore the order of child elements, I wrote a simple python tool for this called xmldiffs:

Compare two XML files, ignoring element and attribute order.

Usage: xmldiffs [OPTION] FILE1 FILE2

Any extra options are passed to the diff command.

Get it at https://github.com/joh/xmldiffs

joh
  • 1,675
  • 1
  • 11
  • 3
7

Diffxml gets the basic functionality correct, though it doesn't seem to offer many options for configuration.

Edit: Project Diffxml has been migrated to GitHub since 2013.

stefan123t
  • 38
  • 6
dsolimano
  • 2,906
  • 2
  • 23
  • 37
1

My Python script xdiff.py for comparing XML files ignores differences in whitespace or attribute order (in contrast to element order).

In order to compare two files 1.xml and 2.xml, you would run the script as follows:

xdiff.py 1.xml 2.xml

In the OP's example, it would output nothing and return exit status 0 (for no structural or textual differences).

In cases where 1.xml and 2.xml differ structurally, it mimics the unified output of GNU diff and returns exit status 1. There are various options for controlling the output, such as -a for outputting all context, -n for outputting no context, and -q for suppressing output altogether (while still returning the exit status).

1

If anyone stumbles upon this and is a developer and knows programming languages, then you can also check XML difference using XMLUnit in C# or JAVA.

For checking how does it show's difference you can try this online XML difference checker tool

C# Sample Code to check difference

string control = "<a><b attr=\"abc\"></b></a>";
string test = "<a><b attr=\"xyz\"></b></a>";

var myDiff = DiffBuilder.Compare(Input.FromString(control))
          .WithTest(Input.FromString(test))
          .Build();
          
Assert.IsFalse(myDiff.HasDifferences(), myDiff.ToString());
Jyoti
  • 11
  • 1
-1

Not sure whether (the dependence of) an online tool counts as a solution but, for what it's worth, I got good result in this online XML comparison tool. It simply works.

RayLuo
  • 259
  • 2
  • 6
-1

Our SD Smart Differencer compares documents based on structure as opposed to actual layout.

There's an XML Smart Differencer. For XML, that means matching order of tags and content. It should note that the text string in the specific fragment you indicated was different. It presently doesn't understand the XML notion of tag attributes indicating whether whitespace is normalized vs. significant.

Ira Baxter
  • 629
  • 3
  • 8
  • 19
  • 1
    In your SO profile you provide full disclosure about your employer; I'd have preferred a short disclaimer inside your answer as well :) BTW, I tried to download an evaluation copy, but the request form is 'smart' (via JS) enough to disable the combination XML with Smart Differencer (also the latter in combination with Python, although possible according to the SD product page)? – ᴠɪɴᴄᴇɴᴛ Nov 27 '14 at 14:03
  • 3
    Ah. Thanks for the reminder. This is an answer from a time before there was a clear SO policy on this. I'm revising the answer to signal the relationship in SO policy compliant answer. – Ira Baxter Nov 27 '14 at 15:52
-1

I use Beyond Compare to compare all types of text based files. They produce versions for Windows and Linux.

Alan
  • 199
  • 1
  • 7
  • 2
    Plain text comparisons would say the two lines differed, whereas the OP wants them to be reported as the same. – ChrisF Dec 07 '09 at 16:33
  • 5
    i.e. **Canonically compare** the XML. – Chris W. Rea Dec 09 '09 at 20:08
  • 2
    Beyond Compare really sucks for this. It seems to just not be aware of XML elements and do mostly just text comparison. – Rob K May 23 '16 at 17:54
  • Beyond Compare has an XML plugin but I was never able to install it properly, so... Nyeah... I came to this page and got wiser... – Erk Mar 14 '19 at 15:02