Found at: http://publish.ez.no/article/articleprint/28/

Parsing XML with QT's DOM classes



The Document Object Model (DOM) by the World Wide Web Consortium specifies a simple way of interacting with various document formats, including XML. Trolltech's Qt implements some very handy DOM classes. In this tutorial I'll demonstrate the basic principles of DOM, and give some pointers on how to use the Qt DOM classes.

DOM introduction


The Document Object Model (DOM) by the World Wide Web Consortium is an object-oriented approach to document handling. It breaks up the document into a tree of nodes, each node with a set of attributes. We start with a simple xml document:

<xml>
  <smith>
    <john/>
    <susan/>
  </smith>
  <throckmorton>
    <baldrick/>
    <evangeline/>
    <waldemar/>
  </throckmorton>
</xml>



A simple DOM tree

When this document is parsed using a DOM-compliant parser, you get a DOM tree as shown to the right. Each circle represents a node, and the arrows shows the relations between them. There are different types of nodes, the most common are element nodes. (All the nodes in this first example are element nodes.) More about the different types of nodes later.

The node named "xml" is the first node of the document and is called the document element. The nodes "smith" and "throckmorton" are children of "xml". "xml" is the parent of "smith" and "throckmorton". And, logically, "smith" and "throckmorton" are siblings. Using this concept, we can navigate the document as follows:

#include <qdom.h>

QDomDocument doc( "myDocument" );
doc.setContent( &myFile );                        // myFile is a QFile

QDomElement docElement = doc.documentElement();   // docElement now refers to the node "xml"
QDomNode node;

node = docElement.firstChild();                   // node now refers to the node "smith"
node = node.firstChild();                         // node now refers to the node "john"
node = node.parentNode();                         // node now refers to "smith" again
node = node.nextSibling();                        // node now refers to "throckmorton"
node = node.firstChild().nextSibling();           // node now refers to "evangeline"


As you can see, functions you'll be using frequently are firstChild(), lastChild(), parentNode(), nextSibling() and previousSibling(). There are also functions for accessing the DTD (Document Type Declaration) of a document.

More about nodes


As I said earlier, there are different kinds of nodes. This simple document shows three different types:

<xml>
  <throckmorton>
    <!-- text describing Mr. Throckmorton -->
    Throckmorton is a really nice guy.
  </throckmorton>
</xml>

"throckmorton" is an element node. The string "text describing Mr. Throckmorton" is a comment node. The string "Throckmorton is a really nice guy." is a text node. Other types of nodes are CDATASection, which is used to escape blocks of text containing characters that would otherwise be regarded as markup, and DocumentFragment, which contains a tree of nodes, and is useful when moving and copying parts of a document.

A node has a node name and, possibly, a node value. In the previous example, the node name of the "xml" node is "xml". Element nodes have no value. Comment nodes always have the node name "#comment", and the node value is the string within the comment ("text describing Mr. Throckmorton", in this case). Similarly, text nodes always have the node name "#text" and the node value is the actual text.

In addition to a name and a value, a node may also have a number of attributes. Consider this xml line:

<img src="images/img42.png" border="0" width="30" height="75" />


In this case, we have an element node named "img", which has four attributes. Attributes also come in name/value pairs. The attribute named "src" has the value "images/img42.png", the attribute named "border" has the value "0" and so on. Attributes can be addressed by name or number.

Accessing document information


Now you know how to navigate a document using DOM nodes, so now you may want to actually access the information. Here's how:

First, traverse the document using the tree traversing algorithm of your choice. When you reach the node you want to have a closer look at, you could access it like in this example:

#include <qdom.h>
#include <qstring.h>

// Node is a QDomNode which refers to the node we are looking at
QString NodeName = Node.nodeName();
QString NodeValue = Node.nodeValue();

// this gives you the value of the attribute named "src"
QString srcValue = Node.attributes().nameItem( "src" ).nodeValue()

// this gives you the value of the second attribute, since numbers start with 0
QString srcValue = Node.attributes().item( 1 ).nodeValue()


This is a very simple example. In a real world application you'd probably want to check which node type you are dealing with (e.g. use QDomNode::isElement() to check whether it is an element node), then convert it to that type (e.g. use QDomNode::toElement() to convert it) and finally use the special functions related to that node type. For example, the QDomElement class has a function setTagName(QString) that allows you to change the name of an element (which means changing the tag name). The QDomText class has no such function, since the name of a text node always is "#text".

An important note: When you need to create new nodes, use QDomDocument's createElement(), createTextNode(), createComment() etc. They will create nodes that has ownerDocument() set correctly, and can be inserted directly into the DOM tree using QDomNode functions such as insertBefore() and appendChild().

After you have edited the document, you might want to save it back to disk. These few lines of code will do the trick:

#include <qdom.h>
#include <qtextstream.h>

// File is a pointer to a valid QFile object,
// Doc is a pointer to the QDomDocument
if ( File->open( IO_WriteOnly ) )
{
    QTextStream stream( File );
    stream << Doc->toString();
    File->close();
}


I hope this is enough information to get you started. Later I might publish a simple xml tree viewer/editor I've been working on. It's not a very useful application, but it demonstrates the DOM principles quite well. Stay tuned!

Suggested reading


| Back to normal page view |