Course sections

Software Development and Design, Lecture 2

Lesson 2: Describe parsing of common data format (XML, JSON, YAML) to Python data structures

Describe parsing of common data format (XML, JSON, and YAML) to Python data structures

eXtensible Markup Language (or XML) is a markup language much like HTML and consists of a set of rules for encoding documents that are human and machine readable. XML was formally defined in W3C specification.

Using XML, you can define your own tags or elements, their order, and how they are supposed to be processed or displayed on screen. XML encoded file can live on a server or take on a transient when being transmitted between two machines.

One of the most distinguishing characteristics of XML is that it allows you to define your own tags or elements, as opposed to HTML where tags are standardized. It is similar to HTML, but at the same time more flexible, i.e. it is both a language as well as a meta-language where you can define other languages using as it the basis, for example RSS or XSLT.

XML documents have sections that are known as elements and are defined by opening and closing tags. Tag is simply a markup construct that starts with < and ends with < symbols. The content of the element is what goes between the opening and closing tags. Elements can contain other elements or sub-elements, the <persons> is the root element and contains all others. Last but not least, an element or sub-element can also contain attributes which are carried within the < and > symbols, e.g. <state name=”california” capital=”sacramento”>. “name=california” and “capital=sacramento” name-value pairs are the attributes.

XML Parsing in Python

Python allows you to parse, modify and build XML documents. Your XML document can be stored in a file or in the form of a string. There are two well-known methods to parse XML with Python, i.e. you can use ElementTree (ET) APIs or minidom class to load and parse XML.

XML date format is hierarchical in nature and the most fitting way to represent that data is with a tree. ET has two classes to help break that hierarchy down into two levels, i.e. ElementTree which represents the whole XML document as a tree and Element which represents a single node in that tree.

Interaction with the entire document, such as reading and writing files, is commonly done using the ElementTree, whereas interactions with a single XML element (or child) or sub-elements (or sub-child) are carried out using the Element level.

Using ElementTree APIs to parse XML

XML

Code Snippet

Code Output

Just to summarize, we are able to parse XML using the ElementTree library.

Learn, Build, Fork, and Share with Our Instant IDE.

Hit the Green Play Button to Execute.

Using minidom Class to parse XML

You can also use Minimal Document Object Model (or Mini DOM) module to parse XML documents, however for security reasons, it is preferred to use ElementTree module instead.

Using minidom, you can achieve parsing in three simple steps.

  • Import xml.dom.minidom module
  • Utilize the function parse (i.e. minidom.parse) to parse the document (minidom.parse (“persons.xml”)
  • Get the XML Elements using doc.getElementsByTagName(“element”)

Code Snippet

Code Output

Learn, Build, Fork, and Share with Our Instant IDE.

Hit the Green Play Button to Execute.

JSON Parsing in Python

JavaScript Object Notation (or JSON) is language agnostic is documented as its own data encoding standard. It supports primitive types such as strings and numbers along with nested lists and objects.

Python includes a native JSON package that you can use to both encode and decode data. You can use “import json” to import the entire package and parse JSON data into a python dictionary or list. You can parse JSON file using the json.load() into python dictionary data structure which is organized in key:value pairs. You can also read and write JSON strings using json.loads() and json.dumps methods respectively.

Code Snippet

Code Output

Learn, Build, Fork, and Share with Our Instant IDE.

Hit the Green Play Button to Execute.


YAML Parsing in Python

YAML Ain’t Markup Language (or YAML) is the most human friendly data encoding or serialization standard out there. Much like JSON, it is also a language agnostic data encoding method. You can use PyYAML library to read and write YAML data.

You can import pyYAML library using “import yaml” and then load YAML file into python dictionary object or data structure using yaml.safe_load() method. You can use yaml.dump() method to write YAML.

Code Snippet

Code Output

Learn, Build, Fork, and Share with Our Instant IDE.

Hit the Green Play Button to Execute.