Description
jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.
json-streamer alternatives and similar packages
Based on the "Text Processing" category.
Alternatively, view json-streamer alternatives based on common mentions on social networks and blogs.
-
Lark
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity. -
ftfy
Fixes mojibake and other glitches in Unicode text, after the fact. -
TextDistance
📐 Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage. -
jellyfish
🪼 a python library for doing approximate and phonetic matching of strings. -
shortuuid
A generator library for concise, unambiguous and URL-safe UUIDs. -
python-user-agents
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings. -
Data Profiler
What's in your data? Extract schema, statistics and entities from datasets -
Levenshtein
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity -
pyparsing
Python library for creating PEG parsers [Moved to: https://github.com/pyparsing/pyparsing] -
Construct
Construct: Declarative data structures for python that allow symmetric parsing and building -
python-nameparser
A simple Python module for parsing human names into their individual components -
Charset Normalizer
Truly universal encoding detector in pure Python -
simplematch
Minimal, super readable string pattern matching for python. -
nider
Python package to add text to images, textures and different backgrounds -
HaikunatorPY
Generate Heroku-like random names to use in your python applications -
Efficient keyword mining with regular expressions
Efficient string matching with regular expressions
Write Clean Python Code. Always.
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of json-streamer or a related project?
README
json-streamer 
jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.
Dependencies
git clone [email protected]:lloyd/yajl.git
cd yajl
./configure && make install
Setup
pip3 install jsonstreamer
Also available at PyPi - https://pypi.python.org/pypi/jsonstreamer
Example
Shell
python -m jsonstreamer.jsonstreamer < some_file.json
Code
variables which contain the input we want to parse
json_object = """
{
"fruits":["apple","banana", "cherry"],
"calories":[100,200,50]
}
"""
json_array = """[1,2,true,[4,5],"a"]"""
a catch-all event listener function which prints the events
def _catch_all(event_name, *args):
print('\t{} : {}'.format(event_name, args))
JSONStreamer Example
Event listeners get events in their parameters and must have appropriate signatures for receiving their specific event of interest.
JSONStreamer provides the following events:
- doc_start
- doc_end
- object_start
- object_end
- array_start
- array_end
- key - this also carries the name of the key as a string param
- value - this also carries the value as a string|int|float|boolean|None param
- element - this also carries the value as a string|int|float|boolean|None param
Listener methods must have signatures that match
For example for events: doc_start, doc_end, object_start, object_end, array_start and array_end the listener must be as such, note no params required
def listener():
pass
OR, if your listener is a class method, it can have an additional 'self' param as such
def listener(self):
pass
For events: key, value, element listeners must also receive an additional payload and must be declared as such
def key_listener(key_string):
pass
import and run jsonstreamer on 'json_object'
from jsonstreamer import JSONStreamer
print("\nParsing the json object:")
streamer = JSONStreamer()
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_object[0:10]) #note that partial input is possible
streamer.consume(json_object[10:])
streamer.close()
output
Parsing the json object:
doc_start : ()
object_start : ()
key : ('fruits',)
array_start : ()
element : ('apple',)
element : ('banana',)
element : ('cherry',)
array_end : ()
key : ('calories',)
array_start : ()
element : (100,)
element : (200,)
element : (50,)
array_end : ()
object_end : ()
doc_end : ()
run jsonstreamer on 'json_array'
print("\nParsing the json array:")
streamer = JSONStreamer() #can't reuse old object, make a fresh one
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_array[0:5])
streamer.consume(json_array[5:])
streamer.close()
output
Parsing the json array:
doc_start : ()
array_start : ()
element : (1,)
element : (2,)
element : (True,)
array_start : ()
element : (4,)
element : (5,)
array_end : ()
element : ('a',)
array_end : ()
doc_end : ()
ObjectStreamer Example
ObjectStreamer provides the following events:
- object_stream_start
- object_stream_end
- array_stream_start
- array_stream_end
- pair
- element
import and run ObjectStreamer on 'json_object'
from jsonstreamer import ObjectStreamer
print("\nParsing the json object:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_object[0:9])
object_streamer.consume(json_object[9:])
object_streamer.close()
output
Parsing the json object:
object_stream_start : ()
pair : (('fruits', ['apple', 'banana', 'cherry']),)
pair : (('calories', [100, 200, 50]),)
object_stream_end : ()
run the ObjectStreamer on the 'json_array'
print("\nParsing the json array:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_array[0:4])
object_streamer.consume(json_array[4:])
object_streamer.close()
output - note that the events are different for an array
Parsing the json array:
array_stream_start : ()
element : (1,)
element : (2,)
element : (True,)
element : ([4, 5],)
element : ('a',)
array_stream_end : ()
Example on attaching listeners for various events
ob_streamer = ObjectStreamer()
def pair_listener(pair):
print('Explicit listener: Key: {} - Value: {}'.format(pair[0],pair[1]))
ob_streamer.add_listener('pair', pair_listener) #same for JSONStreamer
ob_streamer.consume(json_object)
ob_streamer.remove_listener(pair_listener) #if you need to remove the listener explicitly
Even easier way of attaching listeners
class MyClass:
def __init__(self):
self._obj_streamer = ObjectStreamer() #same for JSONStreamer
# this automatically finds listeners in this class and attaches them if they are named
# using the following convention '_on_eventname'. Note method names in this class
self._obj_streamer.auto_listen(self)
def _on_object_stream_start(self):
print ('Root Object Started')
def _on_pair(self, pair):
print('Key: {} - Value: {}'.format(pair[0],pair[1]))
def parse(self, data):
self._obj_streamer.consume(data)
m = MyClass()
m.parse(json_object)
Troubleshooting
- If you get an
OSError('Yajl cannot be found.')
Please ensure that libyajl is available in the relevant directory. For example, on mac(osx) /usr/local/lib should have a "libyajl.dylib" Linux -> libyajl.so Windows -> yajl.dll