Implementing a streaming attachment feature for jcouchdb, I started to wonder whether it would be a good idea for svenson to support JSON parsing from a stream, too, as I don’t really need the complete stream to start constructing the java object graph.

Implementing stream parsing was really nice and easy thanks to the units test present in svenson. After that, I came upon two ways to generally cut down on memory use. All tokens with fixed values could just have a single instance. The recording of tokens to provide token based look ahead was not really needed in all cases. But how much does that save?

As a test case I wrote a small tool class to generate random, nested JSON datasets, generated two test files of 65kb and 4.5mb size and parsed these with svenson 1.2.8 and what now is svenson 1.3.

Measuring the actual memory usage for these two test files proved to be difficult. Somehow none of the programs I tried seemed to give me the data I wanted. Eclipse TPTP just ignored Strings that were no member of any class but just parameters, making stream and string parsing look exactly the same memory-wise. tijmp and others did not provide the data I wanted at all.

So in the end I wrote a little python script that parses a hprof ASCII output to

  • sum up all memory use
  • group allocations by class type, but only if the stack trace of it touches svenson
  • output the top 10 of those classes and the sums

This provided meaningful data and also showed some points for further improvement. There was a huge number of java.lang.reflect.Method allocations which turned out to be caused by svenson inspecting the target classes for annotations and appropriate methods which was done on a per target basis instead of the better per target class basis.

All in all the memory usage went down quite a bit:

memory usage for different svenson versions, with and without streaming

memory usage for different svenson versions, with and without streaming

45% less memory for the small file and 62% for the large file for all allocations. I think that is really good..

Below are some links to the files needed to repeat the benchmarking. The transform hprof script might also prove to be useful for other projects if changed appropriately.

The new jcouchdb release will also use stream parsing.

Links:

edit:
The command to generate the hprof file was something like

java -agentlib:hprof=heap=sites,depth=100,cutoff=0 -cp .. svensonperf.ReadJSONOld big.json