Professional Documents
Culture Documents
Pyxml
Pyxml
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 1 / 18
XML in Python
the two standard approaches for XML processing are supported in the
standard library:
I xml.dom.* – a standard DOM API
I xml.sax.* – a standard SAX API
but there’s a more pythonic API: xml.etree.ElementTree (ET for
short)
I supports both DOM-like (i.e. all-in-memory) and SAX-like (i.e.
event-based, streaming) processing
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 2 / 18
ET: loading an XML doc
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file=’sample.xml’)
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 3 / 18
ET: traversing the tree
root = tree.getroot()
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 4 / 18
ET: simple searching
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 5 / 18
ET: complex searching using XPath
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 6 / 18
ET: creating+storing an XML doc
root = ET.Element(’root)
new elem = ET.SubElement(root, ’data’)
tree.ET.ElementTree(root)
import sys
tree.write(sys.stdout)
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 7 / 18
JSON
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 8 / 18
XML vs. JSON – a first glimpse
<?xml version="1.0"?> {
<book id="123"> "id": 123,
<title>Object Thinking</title> "title": "Object Thinking",
<author>David West</author> "author": "David West",
<published> "published": {
<by>Microsoft Press</by> "by": "Microsoft Press",
<year>2004</year> "year": 2004
</published> }
</book> }
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 9 / 18
JSON – a quick syntax tour
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 10 / 18
JSON – data types
number
string
boolean
array
object
null
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 11 / 18
JSON in Python
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 12 / 18
json: Implicit type conversions
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 13 / 18
json: serializing/deserializing
import json
restored = json.loads(serialized)
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 14 / 18
json: selected serialization options
There’s some space for customizing the serialization (within the limits
given by the JSON spec):
encoding – the character encoding (utf-8 by default)
indent – pretty-printing with the specified indent level for object
members
sort keys – output of dictionaries sorted lexicographically by key
separator – tuple (item sep, key sep)
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 15 / 18
XML vs. JSON – similarities
both XML and JSON are frequently used for data interchange
both formats are human readable (if designed properly)
both are currently supported by many programming languages
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 16 / 18
XML vs. JSON – differences
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 17 / 18
XML vs. JSON – can we estimate future from history?
Zdeněk Žabokrtský, Rudolf Rosa (ÚFAL) XML & JSON in Python Techno4NLP 18 / 18