Reading an XML file

The xml.etree.ElementTree module contains the Element class, which allows you to inspect an XML document by accessing its methods and attributes, as well as the indexing of its elements.

In this example, we are reading an XML file called books.xml:

<?xml version="1.0" encoding="UTF-8"?>
<root>
<book id="book1" name="Learning Python 2">
<title>Learning Python 2</title>
<publisher>Packt Publishing</publisher>
<numberOfChapters>13</numberOfChapters>
<pageCount>500</pageCount>
<author>Author1</author>
<chapters>
<chapter>
<chapterNumber>1</chapterNumber>
<chapterTitle>Chapter1</chapterTitle>
<pageCount>30</pageCount>
</chapter>
<chapter>
<chapterNumber>2</chapterNumber>
<chapterTitle>Chapter2</chapterTitle>
<pageCount>25</pageCount>
</chapter>
</chapters>
</book>
<book id="book2" name="Learning Python 3">
<title>Learning Python 3</title>
<publisher>Packt Publishing</publisher>
<numberOfChapters>10</numberOfChapters>
<pageCount>400</pageCount>
<author>Author2</author>
<chapters>
<chapter>
<chapterNumber>1</chapterNumber>
<chapterTitle>Chapter1</chapterTitle>
<pageCount>30</pageCount>
</chapter>
<chapter>
<chapterNumber>2</chapterNumber>
<chapterTitle>Chapter2</chapterTitle>
<pageCount>25</pageCount>
</chapter>
</chapters>
</book>
</root>

We can use the parse method from the ElementTree module for reading an XML file, passing as an argument the path of the XML file. This is the definition of the parse method:

 Elementtree.parse('<path_xml_file>')

In this example we are using the parse method to process the books.xml file:

 >>> import xml.etree.ElementTree as ET
>>> books = ET.parse("books.xml")

With the getroot() method, we can access the node root:

>>> root = books.getroot()
>>> print(root)
<Element 'root' at 0x02F5DA20>

With the tag property, we can access the string identifying what kind of data this element represents:

>>> print(root.tag)
root

By iterating over each element, we can access attributes with the attrib property and access the text of a final element:

>>> for child in root:
>>> print(child.tag, child.attrib)
>>> for element in child:
>>> print(element.tag, element.text)

This is the output of the previous commands, where we can see the values of each book element:

book {'id': 'book1', 'name': 'Learning Python 2'}
title Learning Python 2
publisher Packt Publishing
numberOfChapters 13
pageCount 500
author Author1
book {'id': 'book2', 'name': 'Learning Python 3'}
title Learning Python 3
publisher Packt Publishing
numberOfChapters 10
pageCount 400
author Author2

If we need access to the contents of a specific attribute, we can use the form child.attrib['name_attribute']:

>>> for child in root:
>>> print(child.tag, child.attrib['id'],child.attrib['name'])
book book1 Learning Python 2
book book2 Learning Python 3

In the following script we can see how we can iterate over the books.xml file. You can find the following code in the books_iterate_xml.py file:

from xml.etree.cElementTree import iterparse

def books(file):
for event, elem in iterparse(file):
if event == 'start' and elem.tag == 'root':
books = elem
if event == 'end' and elem.tag == 'book':
print('{0}, {1}, {2}, {3}, {4}'. format(elem.findtext('title'), elem.findtext('publisher'), elem.findtext('numberOfChapters'), elem.findtext('pageCount'),elem.findtext('author')))
if event == 'end' and elem.tag == 'chapter':
print('{0}, {1}, {2}'. format(elem.findtext('chapterNumber'), elem.findtext('chapterTitle'), elem.findtext('pageCount')))

if __name__ == '__main__':
books(open("books.xml"))

This is the output of the previous script, where we can see the values of each book element and the chapter elements for each book:

1,Chapter1,30
2,Chapter2,25
Learning Python 2,Packt Publishing,13,500,Author1
1,Chapter1,30
2,Chapter2,25
Learning Python 3,Packt Publishing,10,400,Author2
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset