There are several libraries for manipulating JSON in Scala. We prefer json4s, but if you are a die-hard fan of another JSON library, you should be able to readily adapt the examples in this chapter. Let's create a build.sbt
file with a dependency on json4s
:
// build.sbt scalaVersion := "2.11.7" libraryDependencies += "org.json4s" %% "json4s-native" % "3.2.11"
We can then import json4s
into an SBT console session with:
scala> import org.json4s._ import org.json4s._ scala> import org.json4s.native.JsonMethods._ import org.json4s.native.JsonMethods._
Let's use json4s
to parse the response to our GitHub API query:
scala> val jsonResponse = parse(response) jsonResponse: org.json4s.JValue = JObject(List((login,JString(odersky)),(id,JInt(795990)),...
The parse
method takes a string (that contains well-formatted JSON) and converts it to a JValue
, a supertype for all json4s
objects. The runtime type of the response to this particular query is JObject
, which is a json4s
type representing a JSON object.
JObject
is a wrapper around a List[JField]
, and JField
represents an individual key-value pair in the object. We can use extractors to access this list:
scala> val JObject(fields) = jsonResponse fields: List[JField] = List((login,Jstring(odersky)),...
What's happened here? By writing val JObject(fields) = ...
, we are telling Scala:
JObject
JObject
instance and bind the list of fields to the constant fields
Readers familiar with Python might recognize the similarity with tuple unpacking, though Scala extractors are much more powerful and versatile. Extractors are used extensively to extract Scala types from json4s
types.
Pattern matching using case classes
How exactly does the Scala compiler know what to do with an extractor such as:
val JObject(fields) = ...
JObject
is a case class with the following constructor:
case class JObject(obj:List[JField])
Case classes all come with an extractor that reverses the constructor exactly. Thus, writing val JObject(fields)
will bind fields
to the obj
attribute of the JObject
. For further details on how extractors work, read Appendix, Pattern Matching and Extractors.
We have now extracted fields
, a (plain old Scala) list of fields from the JObject
. A JField
is a key-value pair, with the key being a string and value being a subtype of JValue
. Again, we can use extractors to extract the values in the field:
scala> val firstField = fields.head firstField: JField = (login,JString(odersky)) scala> val JField(key, JString(value)) = firstField key: String = login value: String = odersky
We matched the right-hand side against the pattern JField(_, JString(_))
, binding the first element to key
and the second to value
. What happens if the right-hand side does not match the pattern?
scala> val JField(key, JInt(value)) = firstField scala.MatchError: (login,JString(odersky)) (of class scala.Tuple2) ...
The code throws a MatchError
at runtime. These examples demonstrate the power of nested pattern matching: in a single line, we managed to verify the type of firstField
, that its value has type JString
, and we have bound the key and value to the key
and value
variables, respectively. As another example, if we know that the first field is the login field, we can both verify this and extract the value:
scala> val JField("login", JString(loginName)) = firstField loginName: String = odersky
Notice how this style of programming is declarative rather than imperative: we declare that we want a JField("login", JString(_))
variable on the right-hand side. We then let the language figure out how to check the variable types. Pattern matching is a recurring theme in functional languages.
We can also use pattern matching in a for loop when looping over fields. When used in a for loop, a pattern match defines a partial function: only elements that match the pattern pass through the loop. This lets us filter the collection for elements that match a pattern and also apply a transformation to these elements. For instance, we can extract every string field in our fields
list:
scala> for { JField(key, JString(value)) <- fields } yield (key -> value) List[(String, String)] = List((login,odersky), (avatar_url,https://avatars.githubusercontent.com/...
We can use this to search for specific fields. For instance, to extract the "followers"
field:
scala> val followersList = for { JField("followers", JInt(followers)) <- fields } yield followers followersList: List[Int] = List(707) scala> val followers = followersList.headOption blogURL: Option[Int] = Some(707)
We first extracted all fields that matched the pattern JField("follower", JInt(_))
, returning the integer inside the JInt
. As the source collection, fields
, is a list, this returns a list of integers. We then extract the first value from this list using headOption
, which returns the head of the list if the list has at least one element, or None
if the list is empty.
We are not limited to extracting a single field at a time. For instance, to extract the "id"
and "login"
fields together:
scala> { for { JField("login", JString(loginName)) <- fields JField("id", JInt(id)) <- fields } yield (id -> loginName) }.headOption Option[(BigInt, String)] = Some((795990,odersky))
Scala's pattern matching and extractors provide you with an extremely powerful way of traversing the json4s
tree, extracting the fields that we need.
We have already discovered parts of json4s
's type hierarchy: strings are wrapped in JString
objects, integers (or big integers) are wrapped in JInt
, and so on. In this section, we will take a step back and formalize the type structure and what Scala types they extract to. These are the json4s
runtime types:
val JString(s) // => extracts to a String
val JDouble(d) // => extracts to a Double
val JDecimal(d) // => extracts to a BigDecimal
val JInt(i) // => extracts to a BigInt
val JBool(b) // => extracts to a Boolean
val JObject(l) // => extracts to a List[JField]
val JArray(l) // => extracts to a List[JValue]
JNull // => represents a JSON null
All these types are subclasses of JValue
. The compile-time result of parse
is JValue
, which you normally need to cast to a concrete type using an extractor.
The last type in the hierarchy is JField
, which represents a key-value pair. JField
is just a type alias for the (String, JValue)
tuple. It is thus not a subtype of JValue
. We can extract the key and value using the following extractor:
val JField(key, JInt(value)) = ...
In the previous sections, you learned how to traverse JSON objects using extractors. In this section, we will look at a different way of traversing JSON objects and extracting specific fields: the XPath DSL (domain-specific language). XPath is a query language for traversing tree-like structures. It was originally designed for addressing specific nodes in an XML document, but it works just as well with JSON. We have already seen an example of XPath syntax when we extracted the stock price from the XML document returned by the "Markit on demand" API in Chapter 4, Parallel Collections and Futures. We extracted the node with tag "LastPrice"
using r "LastPrice"
. The operator was defined by the
scala.xml
package.
The json4s
package exposes a similar DSL to extract fields from JObject
instances. For instance, we can extract the "login"
field from the JSON object jsonResponse
:
scala> jsonResponse "login" org.json4s.JValue = JString(odersky)
This returns a JValue
that we can transform into a Scala string using an extractor:
scala> val JString(loginName) = jsonResponse "login" loginName: String = odersky
Notice the similarity between the XPath DSL and traversing a filesystem: we can think of JObject
instances as directories. Field names correspond to file names and the field value to the content of the file. This is more evident for nested structures. The users
endpoint of the GitHub API does not have nested documents, so let's try another endpoint. We will query the API for the repository corresponding to this book: "https://api.github.com/repos/pbugnion/s4ds". The response has the following structure:
{ "id": 42269470, "name": "s4ds", ... "owner": { "login": "pbugnion", "id": 1392879 ... } ... }
Let's fetch this document and use the XPath syntax to extract the repository owner's login name:
scala> val jsonResponse = parse(Source.fromURL( "https://api.github.com/repos/pbugnion/s4ds" ).mkString) jsonResponse: JValue = JObject(List((id,JInt(42269470)), (name,JString(s4ds))... scala> val JString(ownerLogin) = jsonResponse "owner" "login" ownerLogin: String = pbugnion
Again, this is much like traversing a filesystem: jsonResponse "owner"
returns a JObject
corresponding to the "owner"
object. This JObject
can, in turn, be queried for the "login"
field, returning the value JString(pbugnion)
associated with this key.
What if the API response is an array? The filesystem analogy breaks down somewhat. Let's query the API endpoint listing Martin Odersky's repositories: https://api.github.com/users/odersky/repos. The response is an array of JSON objects, each of which represents a repository:
[ { "id": 17335228, "name": "dotty", "size": 14699, ... }, { "id": 15053153, "name": "frontend", "size": 392 ... }, { "id": 2890092, "name": "scala", "size": 76133, ... }, ... ]
Let's fetch this and parse it as JSON:
scala> val jsonResponse = parse(Source.fromURL( "https://api.github.com/users/odersky/repos" ).mkString) jsonResponse: JValue = JArray(List(JObject(List((id,JInt(17335228)), (name,Jstring(dotty)), ...
This returns a JArray
. The XPath DSL works in the same way on a JArray
as on a JObject
, but now, instead of returning a single JValue
, it returns an array of fields matching the path in every object in the array. Let's get the size of all Martin Odersky's repositories:
scala> jsonResponse "size" JValue = JArray(List(JInt(14699), JInt(392), ...
We now have a JArray
of the values corresponding to the "size"
field in every repository. We can iterate over this array with a for
comprehension and use extractors to convert elements to Scala objects:
scala> for { JInt(size) <- (jsonResponse "size") } yield size List[BigInt] = List(14699, 392, 76133, 32010, 98166, 1358, 144, 273)
Thus, combining extractors with the XPath DSL gives us powerful, complementary tools to extract information from JSON objects.
There is much more to the XPath syntax than we have space to cover here, including the ability to extract fields nested at any level of depth below the current root or fields that match a predicate or a certain type. We find that well-designed APIs obviate the need for many of these more powerful functions, but do consult the documentation (json4s.org
) to get an overview of what you can do.
In the next section, we will look at extracting JSON directly into case classes.