JSONPath is an excellent way to simplify access of values deep within complicated JSON structures. It is widely supported in web programming languages as well as many databases.
I was recently given a task to to create an API and data structure for issuing arbitrary corrections to a JSON document, while keeping the original JSON unmodified. The size of each JSON documents made it very inefficient to store new copies of the document each time it was modified, so I needed to use a delta (change) format. I immediately thought of using JSONPath—path expressions for JSON documents—as we often use it to query our JSON. It is widely implemented and easily understood. JSONPath is not typically used to modify data in a JSON document, but I decided to give it a go.
Ultimately, I wanted to be able to store a list of corrections that could be applied to a target one-at a time in order. Each correction would be a combination of a JSONPath representing the target location, and a value to write at that location.
Storing both the corrections and the original document, retrieving the final version would be simple:
document = json.loads(target_json)
for correction in corrections:
apply_correction(document, correction)
JSONPath expressions look like this:
'$'
— the dollar sign represents the entire JSON document, or the root node of the document'$.customer'
— using dot notation, you can access individual values by their key"$['customer']"
— access the same key using bracket notation (dot- and bracket notation can even be mixed)'$.line_items[0]'
— using brackets, you can access individual list items by their 0-based index'$.line_items[0].product_id'
— there is no limit on nesting, allowing deep queries very easily'$.line_items[*]'
— this wills select all children ofline_items
, and is not exactly useful for our purpose of small corrections'$.line_items[?@.qty > 5]'
JSONPath also supports logical queries. This would select all line items with a quantity greater than 5. This type of filtering is also not useful for issuing corrections
The path expressions that can result in multiple results did not make sense in the scope of issuing document corrections, but the rest looked simple enough. In fact they looked exactly like Python's list-like and dict-like accessors.
Sample Data
Let's set up some sample data, representing a purchase order and two corrections:
target_json = '''{
"customer": "Wright, Callahan and Hale",
"order_timestamp": "2024-10-15T14:21:28.200830Z",
"line_items": [
{
"product_id": "0493774426549",
"qty": 3,
"unit_price": "586.12"
}
],
"shipping_destination": {
"address_1": "804 Martinez Walk Apt. 638",
"city": "Thomasland",
"state": "IN",
"country": "USA",
"postal_code": "43216"
}
}'''
correction_1 = {
'json_path': "$['line_items'][0]['qty']",
'value': 6,
'user': 'alyssagreen@example.net',
'timestamp': '2024-10-15T16:27:25.255840Z',
}
correction_2 = {
'json_target': '$.shipping_destination.address_1',
'value': '904 Martinez Walk Apt. 638',
'user': 'alyssagreen@example.net',
'timestamp': '2024-10-15T16:28:24.356125Z',
}
Examining the corrections listed above, notice that correction_1
uses a JSONPath with bracket notation, and correction_2
uses a JSONPath with dot notation. We need to ensure that corrections with each type of path expressions can be applied.
Initial Implementation
With those examples in place, I took a first stab at implementing apply_correction()
:
import json
def apply_correction(target, correction):
"""
Generates and executes a python assignment statement like:
`target.line_item[0].qty = value`
Modifies <target> in-place
"""
json_path: str = correction['json_path']
value = correction['value']
python_path = json_path.replace('$', 'target', 1)
exec(f'{python_path} = value', {}, locals())
document = json.loads(target_json)
apply_correction(document, correction_1)
assert document['line_items'][0]['qty'] == 6
Well I did mention that JSONPath closely mimics Python's own syntax! With a simple string replacement, we were able to turn our JSONPath into an exec()
utable Python assignment. This method immediately brought us halfway to our goal. json.loads()
by default gives us dict
and list
objects, which are mutable, passed by reference, and support bracket notation just like JSONPath!
Security note: this implementation uses
exec()
which should generally be avoided.exec()
andeval()
can open up large security holes and introduce performance issues. They require safe, known inputs. I believe it makes this particular code very succinct but requires care to maintain safety..
Handling dot notation
Unfortunately, attempting to apply our second correction, which uses dot notation, using our initial implementation fails:
>>> apply_correction(document, correction_2)
Traceback (most recent call last):
File "/opt/.pycharm_helpers/pydev/pydevconsole.py", line 364, in runcode
coro = func()
^^^^^^
File "<input>", line 1, in <module>
File "<input>", line 5, in apply_correction
File "<string>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'shipping_destination'
The executed Python statement target.shipping_destination.address_1 = '904 Martinez Walk Apt. 638'
fails because target
is a dict and has no attributes. What we want is for the attribute access .shipping_destination
to access a key in that dictionary, but Python does not do that automatically.
We need a class that will give us both key- and attribute-like access. We could start with a dict
and add attribute access, or start with one of the many attribute-based data classes available in Python (dataclasses, SimpleNamespace, namedtuple...) and add key-like access. SimpleNamespace is an often-overlooked utility baked right into the Python standard library and turns out to fit this usage quite well. SimpleNamespace takes keyword arguments (think dict
) to its constructor and makes them accessible as attributes. We can add __getitem__
and __setitem__
implementations to make it also act like a dict.
from types import SimpleNamespace
class MutableJsonPath(SimpleNamespace):
def __getitem__(self, item):
return getattr(self, item)
def __setitem__(self, key, value):
setattr(self, key, value)
Usage
>>> target = MutableJsonPath(foo='bar', inner=MutableJsonPath(hello='world'))
>>> target.foo
'bar'
>>> target['foo']
'bar'
>>> target.inner.hello
'world'
>>> target['inner']['hello']
'world'
>>> target['inner']['hello'] = '🌍'
>>> target.inner.hello
'🌍'
Final Implementation
Now we need to ensure that when we load from JSON, we use our new MutableJsonPath
class instead of the default dict
type. To customize JSON loading we use the object_hook
parameter to json.loads
. Unfortunately the default constructor for SimpleNamespace
is not compatible with object_hook
so we need a small addition to our class:
class MutableJsonPath(SimpleNamespace):
@classmethod
def from_dict(cls, d: dict):
"""Constructor, compatible with the object_hook param to json.loads"""
return cls(**d)
def __getitem__(self, item):
return getattr(self, item)
def __setitem__(self, key, value):
setattr(self, key, value)
document = json.loads(target_json, object_hook=MutableJsonPath.from_dict)
The final line shows how to apply the new class when loading a JSON document from a string. The resulting document
will be an instance of MutableJsonPath
, as will all nested objects within the original JSON. Arrays from the JSON will still be Python lists, and values will still have their normal Python types.
All together now
With our new class in place, we can now load and apply our two sample corrections!:
import json
from types import SimpleNamespace
target_json = '''{
"customer": "Wright, Callahan and Hale",
"order_timestamp": "2024-10-15T14:21:28.200830Z",
"line_items": [
{
"product_id": "0493774426549",
"qty": 3,
"unit_price": "586.12"
}
],
"shipping_destination": {
"address_1": "804 Martinez Walk Apt. 638",
"city": "Thomasland",
"state": "IN",
"country": "USA",
"postal_code": "43216"
}
}'''
correction_1 = {
'json_path': "$['line_items'][0]['qty']",
'value': 6,
'user': 'alyssagreen@example.net',
'timestamp': '2024-10-15T16:27:25.255840Z',
}
correction_2 = {
'json_path': '$.shipping_destination.address_1',
'value': '904 Martinez Walk Apt. 638',
'user': 'alyssagreen@example.net',
'timestamp': '2024-10-15T16:28:24.356125Z',
}
class MutableJsonPath(SimpleNamespace):
@classmethod
def from_dict(cls, d: dict):
"""Constructor, compatible with the object_hook param to json.loads"""
return cls(**d)
def __getitem__(self, item):
return getattr(self, item)
def __setitem__(self, key, value):
setattr(self, key, value)
def apply_correction(target, correction):
"""
Generates and executes a python assignment statement like: `target.line_item[0].qty = value`
Modifies <target> in-place
"""
json_path: str = correction['json_path']
value = correction['value']
python_path = json_path.replace('$', 'target', 1)
exec(f'{python_path} = value', {}, locals())
document = json.loads(target_json, object_hook=MutableJsonPath.from_dict)
apply_correction(document, correction_1)
assert document['line_items'][0]['qty'] == 6
apply_correction(document, correction_2)
assert document['shipping_destination']['address_1'] == '904 Martinez Walk Apt. 638'
print(document)
Caveats and Improvements
Of course this implementation is not ready for production, as it was made as a quick and dirty prototype. It is a special-purpose, limited implementation of JSONPath. It is ready for many improvements.
-
Input to this function should be validated to ensure security.
-
Converting JSONPath expressions to executable Python helped us reach our goal quickly and generically in this case, but an alternative data structure for expressing the target path might be better for many domains.
-
More robust code might include making a deep copy of the decoded JSON document prior to applying corrections.