Working with JSON

JSON is the acronym of JavaScript Object Notation, and it is a subset of the JavaScript language. It has been there for almost two decades now, so it is well known and widely adopted by basically all languages, even though it is actually language independent. You can read all about it on its website (https://www.json.org/), but I'm going to give you a quick introduction to it now.

JSON is based on two structures: a collection of name/value pairs, and an ordered list of values. You will immediately realize that these two objects map to the dictionary and list data types in Python, respectively. As data types, it offers strings, numbers, objects, and values, such as true, false, and null. Let's see a quick example to get us started:

# json_examples/json_basic.py
import sys
import json

data = {
'big_number': 2 ** 3141,
'max_float': sys.float_info.max,
'a_list': [2, 3, 5, 7],
}

json_data = json.dumps(data)
data_out = json.loads(json_data)
assert data == data_out # json and back, data matches

We begin by importing the sys and json modules. Then we create a simple dictionary with some numbers inside and a list. I wanted to test serializing and deserializing using very big numbers, both int and float, so I put 23141 and whatever is the biggest floating point number my system can handle.

We serialize with json.dumps, which takes data and converts it into a JSON formatted string. That data is then fed into json.loads, which does the opposite: from a JSON formatted string, it reconstructs the data into Python. On the last line, we make sure that the original data and the result of the serialization/deserialization through JSON match.

Let's see, in the next example, what JSON data would look like if we printed it:

# json_examples/json_basic.py
import json

info = {
'full_name': 'Sherlock Holmes',
'address': {
'street': '221B Baker St',
'zip': 'NW1 6XE',
'city': 'London',
'country': 'UK',
}
}

print(json.dumps(info, indent=2, sort_keys=True))

In this example, we create a dictionary with Sherlock Holmes' data in it. If, like me, you're a fan of Sherlock Holmes, and are in London, you'll find his museum at that address (which I recommend visiting, it's small but very nice).

Notice how we call json.dumps, though. We have told it to indent with two spaces, and sort keys alphabetically. The result is this:

$ python json_basic.py
{
"address": {
"city": "London",
"country": "UK",
"street": "221B Baker St",
"zip": "NW1 6XE"
},
"full_name": "Sherlock Holmes"
}

The similarity with Python is huge. The one difference is that if you place a comma on the last element in a dictionary, like I've done in Python (as it is customary), JSON will complain.

Let me show you something interesting:

# json_examples/json_tuple.py
import json

data_in = {
'a_tuple': (1, 2, 3, 4, 5),
}

json_data = json.dumps(data_in)
print(json_data) # {"a_tuple": [1, 2, 3, 4, 5]}
data_out = json.loads(json_data)
print(data_out) # {'a_tuple': [1, 2, 3, 4, 5]}

In this example, we have put a tuple, instead of a list. The interesting bit is that, conceptually, a tuple is also an ordered list of items. It doesn't have the flexibility of a list, but still, it is considered the same from the perspective of JSON. Therefore, as you can see by the first print, in JSON a tuple is transformed into a list. Naturally then, the information that it was a tuple is lost, and when deserialization happens, what we have in data_outa_tuple is actually a list. It is important that you keep this in mind when dealing with data, as going through a transformation process that involves a format that only comprises a subset of the data structures you can use implies there will be information loss. In this case, we lost the information about the type (tuple versus list).

This is actually a common problem. For example, you can't serialize all Python objects to JSON, as it is not clear if JSON should revert that (or how). Think about datetime, for example. An instance of that class is a Python object that JSON won't allow serializing. If we transform it into a string such as 2018-03-04T12:00:30Z, which is the ISO 8601 representation of a date with time and time zone information, what should JSON do when deserializing? Should it say this is actually deserializable into a datetime object, so I'd better do it, or should it simply consider it as a string and leave it as it is? What about data types that can be interpreted in more than one way?

The answer is that when dealing with data interchange, we often need to transform our objects into a simpler format prior to serializing them with JSON. This way, we will know how to reconstruct them correctly when we deserialize them.

In some cases, though, and mostly for internal use, it is useful to be able to serialize custom objects, so, just for fun, I'm going to show you how with two examples: complex numbers (because I love math) and datetime objects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset