XML files were designed as a way to transport and store data. They are platform-independent since the data is stored in a plain text file. Although similar to HTML, XML differs from HTML since the former is designed for display purposes, whereas XML data is designed for data. XML files are sometimes used as an interchange format for GIS data that is going between various software systems.
XML documents have a tree-like structure that is composed of a root
element, child
elements, and element attributes. Elements are also called nodes. All XML files contain a root
element. This root
element is the parent to all other elements or child nodes. The following code example illustrates the structure of an XML document. Unlike HTML files, XML files are case sensitive:
<root> <child att="value"> <subchild>.....</subchild> </child> </root>
In this recipe, you will learn how to read data from an XML file using the nodes
and element
attributes that are a part of the document.
There are a number of ways that you can access nodes within an XML document. Perhaps, the easiest way to do so is to find nodes by tag name and then through walk the tree containing a list of the child nodes. Before doing so, you'll want to parse the XML document with the minidom.parse()
method. Once parsed, you can then use the childNodes
attribute to obtain a list of all the child
nodes starting at root of the tree. Finally, you can search the nodes by tag names with the getElementsByTagName(tag)
function, which accepts a tag name as an argument. This will return a list of all child
nodes that are associated with the tag.
You can also determine if a node contains an attribute by calling hasAttribute(name)
, which will return a true
/false
value. Once you've determined that an attribute exists, a call to getAttribute(name)
will obtain the value for the attribute.
In this exercise, you will parse an XML file and pull out values associated with a particular element (node) and attribute. We'll load an XML file containing wildfire data. In this file, we'll look for the <fire>
node and the address
attribute for each of these nodes. The addresses will be printed out.
C:ArcpyBookAppendix2XMLAccessElementAttribute.py
.WitchFireResidenceDestroyed.xml
file will be used. The file is located in your C:ArcpyBookAppendix2
folder. You can see a sample of its contents, as follows:<fires> <fire address="11389 Pajaro Way" city="San Diego" state="CA" zip="92127" country="USA" latitude="33.037187" longitude="-117.082299" /> <fire address="18157 Valladares Dr" city="San Diego" state="CA" zip="92127" country="USA" latitude="33.039406" longitude="-117.076344" /> <fire address="11691 Agreste Pl" city="San Diego" state="CA" zip="92127" country="USA" latitude="33.036575" longitude="-117.077702" /> <fire address="18055 Polvera Way" city="San Diego" state="CA" zip="92128" country="USA" latitude="33.044726" longitude="-117.057649" /> </fires>
minidom
from xml.dom
:from xml.dom import minidom
xmldoc = minidom.parse("WitchFireResidenceDestroyed.xml")
childNodes = xmldoc.childNodes
<fire>
nodes:eList = childNodes[0].getElementsByTagName("fire")
address
attribute and print the value of the attribute, if it exists:for e in eList: if e.hasAttribute("address"): print(e.getAttribute("address"))
C:ArcpyBookcodeAppendix2
XMLAccessElementAttribute
.py solution file.11389 Pajaro Way 18157 Valladares Dr 11691 Agreste Pl 18055 Polvera Way 18829 Bernardo Trails Dr 18189 Chretien Ct 17837 Corazon Pl 18187 Valladares Dr 18658 Locksley St 18560 Lancashire Way
Loading an XML document into your script is probably the most basic thing you can do with XML files. You can use the xml.dom
module to do this through the use of the minidom
object. The minidom
object has a method called parse()
, which accepts a path to an XML document and creates a document object model (DOM) tree object from the WitchFireResidenceDestroyed.xml
file.
The childNodes
property of the DOM tree generates a list of all the nodes in the XML file. You can then access each of the nodes using the getElementsByTagName()
method. The final step is to loop through each of the <fire>
nodes contained within the eList
variable. For each node, we then check for the address
attribute with the hasAttribute()
method, and if it exists, we call the getAttribute()
function and print the address to the screen.
There will be times when you will need to search an XML document for a specific text string. This requires the use of the xml.parsers.expat
module. You'll need to define a search class derived from the basic expat
class and then create an object from this class. Once created, you can call the parse()
method on the search
object to search for data. Finally, you can then search the nodes by tag names with the getElementsByTagName(tag)
function, which accepts a tag name as an argument. This will return a list of all child nodes that are associated with the tag.