Extracting a Nested JSON Value in Python

First, some context: I’ve been working on some Python libraries at work that do things with sets of json data. This is generally pretty easy: Python has a nice library for reading json, so it can be worked on as a native dictionary object in Python. This is great for simple json objects, but there’s some pretty complex json data sources out there, whether it’s being returned as part of an API, or is stored in a file. Sometimes you need to access a specific value from a key buried a dozen layers deep, and maybe some of those layers are actually arrays of nested json objects inside them.

While both arrays and dictionaries are native to Python, so you can do this, it’s kind of a pain. Thankfully, many smart people have already been tackling things like this, which is how there are now handy libraries implementing a (pseudo-standard) approach for getting the value given a specific json path. (There’s even an online evaluator to help you craft a good jsonpath.) The one I’ve been using is called jsonpath-rw. When I looked at the docs for jsonpath-rw, I was a little frustrated, and I wanted to take a minute to write out my eventual understanding, in case there are other folks in a similar boat to me.

Note: I’m not primarily a programmer (currently I’m a Technical Writer, and prior to that I was a QA Engineer — mostly exploratory, not automation), so while my example works, there’s probably more robust and pythonic ways to do the same thing.

One of the primary ways you’re going to use jsonpath-rw is to find the value of a key in a json object. While this feels backwards to me, you actually set up the jsonpath you’re going to use to search, then specify what json object you’re going to search in. Here’s some example code:

import json
from jsonpath_rw import parse

with open('/path/to/myfile.json') as f:
    json_data = json.load(f)

get_my_value = parse('$.objectArray[1].nestedDict.anotherArray[0].valueIWant').find(json_data)
print get_my_value

If myfile.json looked something like:

{
    "objectArray" : [ {
        "whatever" : "Don't care" 
    },{
        "whatever" : "Also don't care",
        "nestedDict" : {
            "anotherArray" : [ {
                "valueIWant" : "This is what I care about."
            } ]
        }
    } ]
}

Then the code above would print out “This is what I care about.” That path, while specific, is also pretty long, and could probably be shortened. Since the key valueIWant is unique in the example, you could instead have the json path be $..valueIWant and it would still get you the right value. If you tried the same shortcut to get the value of whatever, though, it would return the values of both instances that key is used. You can still get away with some shorthand, though, as $.[0].whatever would get the value from just the first object in objectArray.

Leave a Reply

Your email address will not be published. Required fields are marked *