I need to compare two XML files, each of which is about 13,000 lines long.

Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).

Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.

As an example of my problem:


  <b key="fruit.preferred">banana</b>
  <b key="fruit.available">pineapple</b>
  <b key="fruit.available">apple</b>
  <b key="fruit.available">orange</b>


  <b key="fruit.available">pineapple</b>
  <b key="fruit.preferred">banana</b>
  <b key="fruit.available">apple</b>
  <b key="fruit.available">orange</b>

These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?

By the way, the XML file structures are more complicated than the examples above!

  • Why don't you sort the data your getting from the database before you write the file?
    – Ramhound
    Commented Sep 13, 2011 at 12:07
  • I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
    – Rich
    Commented Sep 13, 2011 at 13:17
  • What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
    – jdigital
    Commented Sep 13, 2011 at 21:35
  • I asked this on softwarerecs.se
    – Jan Doggen
    Commented May 13, 2018 at 20:15

1 Answer 1


I think you can use a tool such as xmldiff for this purposes.


On the tools webpage it states:

The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).

Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .