Finding differences in two Open-Office-Writer documents

If you write documents and get feedback from different persons on different versions it is a great pain to merge the documents and changes together. Microsoft Word has a functionality that works quite well. But the function to compare documents in Open Office Writer has  never work for me the way I expected.

Fortunately OO stores documents in a zip file, containing xml files. The main content of the document is the file content.xml. After changing the extension of the OO Writer document to zip it is possible to open the file with the favorite zip application and extracting the content.xml file. If you do this for both versions you can compare the both files with your favorite text compare tool and you will see … hmmm yes… thousands of changes. This happens especially if the documents have been edited with different versions of Open Office or Libre Office. Most of the changes are not relevant for your comparison.

So we would like to eliminate the changes not interested in to get an overview of the real changes.

We will do this using Notepad++, the tool I use most for work. Additionally we need for formation the document the XML Tools Plugin. Both are free.

We open both versions of content.xml with Notepad++ and do a “Linarize XML” with XML Tools first on both files.

In the next step we replace these six regular expressions with an empty string. This is done recursively until no further replace is possible:

1 [a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+="[^"]*" 2 <([a-zA-Z0-9\-]+:)?[a-zA-Z0-9\-]+\s*/> 3 <([a-zA-Z0-9\-]+:)?[a-zA-Z0-9\-]+\s*>\s*</([a-zA-Z0-9\-]+:)?[a-zA-Z0-9\-]+> 4 <text:changed\-region\s*>.*?<\/text:changed\-region> 5 <office:annotation\s*>.*?<\/office:annotation> 6 <text:bookmark-ref\s*>.*?<\/text:bookmark-ref>

Finally we use the “Pretty print (libxml)” function of XML Tools to get the XML files formatted. Now it is possible to compare the two files with tool for comparing text files and you will see the real text changes.

Bernhard Mähr @ OPITZ-CONSULTING published at http://thecattlecrew.wordpress.com/

Leave a Reply