2. January, 2008
2. January, 2008
in
»
by Michael Neumann

Lately I had an idea about lazily parsing JSON. Parsing JSON comes along with building a data structure in the host language (e.g. Ruby). For larger JSON documents this becomes expensive, especially if you’re only using a few values of the JSON. To do this efficiently, it’s important to specify forward skips as in the following example:

{/*20*/
 a: "test",
 b: "abc",
}

The 20 here means that the closing ”}” is 20 bytes later, so that the JSON parser can skip 20 bytes (after writing down the current location) and continue parsing. It’d create a special Hash object, which would lazily parse the inner JSON upon access.

Of course this requires that the JSON document is kept available (either in memory or on disk). Even for a large JSON document, the memory space to keep it in memory is usually far less than the memory used for all the Ruby values.

Another idea I had was that of a streaming JSON parser, similar to what exists for XML (SAX or expat). This would allow for very (space) efficient extraction of values out of a JSON document (of any size). Well, maybe I’ll rewrite my C++ JSON parser into a streaming parser one day.