TagJSON

Generating List<Something> from JSON with svenson

Often you’ll find yourself wanting to parse a JSON into a Java collection but want the values inside the collection to be of a specific type. Nothing easier than that.

import org.svenson.JSONParser;
…
// Getting a list containing your own type Something.
// Assume json to be a String containing the JSON dataset.
JSONParser parser = new JSONParser();
parser.addTypeHint("[]", Something.class);
List<Something> someThings = parser.parse(List.class, json);
// someThings will be a ArrayList instance by default. You can change
// that by changing the mappings for interfaces by calling
// org.svenson.JSONParser.setInterfaceMappings(Map<Class, Class>)

Parsing into a map is not much more complicated either

JSONParser jsonParser = new JSONParser();
jsonParser.addTypeHint(new RegExPathMatcher("\\.(f1|f2)"), Something.class);
Map<String,Object> someThings = jsonParser.parse(Map.class, json);

If we want to have our Something type for more than a single field, we need to setup a matcher. Here you see an example of a RegExPathMatcher that makes sure that both the keys “f1″ and “f2″ of the map we receive will be converted to Something, while all other fields are not.

If you want to convert all map properties to Something, the RegExPathMatcher would be like this

    … new RegExPathMatcher("\\..*") …

This would match every JSON path that starts with a property. If you don’t like RegularExpressions, or are on some kind of diet on them, you can also construct a more complex matcher tree from the compositable Matchers like this

JSONParser jsonParser = new JSONParser();
jsonParser.addTypeHint(new OrMatcher(
    new PrefixPathMatcher(".f1"),
    new PrefixPathMatcher(".f2")), Something.class);
Map<String,Object> someThings = jsonParser.parse(Map.class, json);

Update: Due to me fucking up both the Prefix-/Suffix- matchers as well as their tests, the last example will only really work with the current svenson trunk/future svenson 1.3.8


Annotating DOM nodes with JSON, Part 2

It’s been a while since I wrote Annotating DOM nodes with JSON and in retrospective I can say that I never really used the method described in a real life project. Now I’d like to present another method of decorating DOM nodes with JSON based on classes. This one I actually implemented in OpenSAGA to have arbitrary metadata from some of the OpenSAGA Widgets.

I didn’t really like the idea of misusing onclick for the purpose of meta-data and thought about a better way of doing it. Browsing the w3 HTML specs I came upon the fact that classes can be any character separated by spaces. So for use-cases where I only needed one meta-data value I used classes like

<div class="refId:id-1234">
    DIV content
</div>

A use-case specific prefix is used to mark a class as meta-data container containing the string after the prefix. The code to evaluate this in javascript is very easy

/**
 * Returns the class value with the given prefix using the giving separator
 * @param {DOMElement} elem DOM element to fetch metadata from
 * @param {String} name of the classval value
 * @param {String} separator to use between name and value. Default is ":"
 */
function classval(elem, name, separator)
{
    var match = new RegExp("\\b" + name + (separator || ":") + "([^ ]*)($| )")
                          .exec(elem.className);
    if (match)
    {
        return match[1];
    }
    return null;
}
…
// assume divElement to be DOM element of the div
var refId = classval(divElement, "refId");

I thought about going for a more elaborate prefix scheme to support nested metadata but in the end decided against it because I already have a nicely supported format for exchanging data between server and client: JSON. So I tried to come up with a scheme of using arbitrary JSON for the metadata decoration.

Only problem: Spaces are not valid inside classes, so I needed a method to encode and decode JSON into valid classes. The method should not totally mangle the JSON to keep readability and maybe write the encoded variant by hand for simple cases.

Solution:

  • HTML encode the JSON-String
  • Replace spaces with underlines and underlines with \u005f

The replacement of underlines is valid because underlines can only occur inside quoted JSON strings so they can just be replaced by their escaped unicode value \u005f.

Here is the java code to do the escaping. Since it’s basically a combination of string replacement and HTML encoding this should be easily doable in any server-side language:

    public String escapeDecoration(String s)
    {
        String escaped = StringEscapeUtils.escapeHtml(s);

        StringBuilder sb = new StringBuilder(escaped.length());
        sb.append("deco:");
        for (int i = 0; i < escaped.length() ; i++)
        {
            char c = escaped.charAt(i);
            switch(c)
            {
                case '_':
                    sb.append("\\u005F");
                    break;
                case ' ':
                    sb.append('_');
                    break;
                default:
                    sb.append(c);
                    break;
            }
        }

        return sb.toString();
    }

The escape method uses the escapeHTML method from Apache commons-lang's StringEscapeUtil. Going the other way in javascript is not that complicated either:

/**
 * Decodes the given string containing HTML entities.
 */
function htmlDecode(s)
{
    var helper = document.createElement("SPAN");
    helper.innerHTML = s;
    return helper.innerHTML;
}

/**
 * Returns the JSON decoration of the given element.
 * @param {DOMElement} DOM element
 * @param {String} decorator classval name, default is "deco".
 */
function decoration(elem, name)
{
    var value, data, result;

    value = classval(elem, name || "deco");
    if (value)
    {
       // get raw data from DOM element
       data = value.replace(/_/g, " ");
       // replace HTML entities with the original characters
       data = htmlDecode(data);
       // evaluate JSON
       result = eval("("+data+")");
    }
    return result || {};
}

In order to achieve a better readability of escaped JSON, I also used svenson's ability to deviate from the JSON standard by using single quotes instead of double quotes. Just comparing

<div id="tst2" class="deco:{'foo':'xxx\u005f_yyy','baz':[1,3,5,7,9]}">
JSON annotation
</div>

to

<div id="tst2" class="deco:{&quot;foo&quot;:&quot;xxx\u005f_yyy&quot;,&quot;baz&quot;:[1,3,5,7,9]}">
JSON annotation
</div>

should demonstrate that single quotes are not only much better readable, but also shorter. If you use eval() evaluate the JSON string, the single quotes are no problem at all. If you want json2.js / native JSON-parsing, you might have to replace the quote chars before parsing.

Links:

HTML test page with both metadata strategies

Scripting JSON

Doing a lot of web stuff and fiddling around with CouchDB, I really got to like JSON as versatile format for things. Installing the JSONView extension for firefox really helps with working with JSON in the browser, but what I’ve been missing so far is an easy way to deal with JSON from bash scripts. Fiddling around with the very interesting NodeJS, I came up with a small node js script that makes JSON handling much easier, the JSON command: It reads a JSON object from stdin and feeds it to a javascript function body with “v” and NodeJS’ “sys” as parameters. The return value of the function is written to stdout. If it was a string, it is written as-is, if it is another object it will be pretty-JSONified.

Simple Example

$ curl -s http://localhost:5984/test | json "return v;"
{
 "db_name": "test",
 "doc_count": 0,
 "doc_del_count": 0,
 "update_seq": 0,
 "purge_seq": 0,
 "compact_running": false,
 "disk_size": 79,
 "instance_start_time": "1274021449672284",
 "disk_format_version": 5
}

Use curl to fetch the status of CouchDB database “test” from the local CouchDB node and then just pretty print it by returning the implicit value v.

$ curl -s http://localhost:5984/test | json "return v.disk_size;"
79

Just print the disk_size of CouchDB database “test”. You can use all the modern JavaScript functions v8 offers plus the implicit “sys” object that lets you log stuff to stderr or inspect objects. A little script that I find highly useful:

#!/bin/bash
# Delete all jcouchdb test databases
DBS=$(curl -s http://localhost:5984/_all_dbs | \
json 'return v.filter( function(db) { return db.indexOf("jcouchdb") == 0; }).join("\n");')

for i in $DBS
do
 curl -X DELETE http://localhost:5984/$i
done

Filter the list of database to only contain those that start with “jcouchdb”, then loop over them to delete.

Links:

Update: Added “return v;” as default function. now also supports “-h” and “–help”.

Memory consumption changes in svenson 1.3

Implementing a streaming attachment feature for jcouchdb, I started to wonder whether it would be a good idea for svenson to support JSON parsing from a stream, too, as I don’t really need the complete stream to start constructing the java object graph.

Implementing stream parsing was really nice and easy thanks to the units test present in svenson. After that, I came upon two ways to generally cut down on memory use. All tokens with fixed values could just have a single instance. The recording of tokens to provide token based look ahead was not really needed in all cases. But how much does that save?

As a test case I wrote a small tool class to generate random, nested JSON datasets, generated two test files of 65kb and 4.5mb size and parsed these with svenson 1.2.8 and what now is svenson 1.3.

Measuring the actual memory usage for these two test files proved to be difficult. Somehow none of the programs I tried seemed to give me the data I wanted. Eclipse TPTP just ignored Strings that were no member of any class but just parameters, making stream and string parsing look exactly the same memory-wise. tijmp and others did not provide the data I wanted at all.

So in the end I wrote a little python script that parses a hprof ASCII output to

  • sum up all memory use
  • group allocations by class type, but only if the stack trace of it touches svenson
  • output the top 10 of those classes and the sums

This provided meaningful data and also showed some points for further improvement. There was a huge number of java.lang.reflect.Method allocations which turned out to be caused by svenson inspecting the target classes for annotations and appropriate methods which was done on a per target basis instead of the better per target class basis.

All in all the memory usage went down quite a bit:

memory usage for different svenson versions, with and without streaming

memory usage for different svenson versions, with and without streaming

45% less memory for the small file and 62% for the large file for all allocations. I think that is really good..

Below are some links to the files needed to repeat the benchmarking. The transform hprof script might also prove to be useful for other projects if changed appropriately.

The new jcouchdb release will also use stream parsing.

Links:

edit:
The command to generate the hprof file was something like

java -agentlib:hprof=heap=sites,depth=100,cutoff=0 -cp .. svensonperf.ReadJSONOld big.json

Two new projects: svenson and jcouchdb

In my never ending fight against teh wind-mills, I have produced two new open source projects that are somehow connected to each other.

First there is svenson which is a release of my own personal JSON generator / parser. The name is a result of a joke. When my boss asked me what was the unique characteristic of it, my first answer was: “It’s written by me!”. So the name comes from “Sven’s JSON”. 

The answer was of course not totally serious. I wrote svenson because none of the JSON parsers out there seemed to have the right combination of being not too complicated yet flexible enough to work well in different scenarios. Being able to use a healthy mix of concrete typed java beans and dynamic map / list constructs seemed to be the best way to go, yet none of the JSON libraries out there seemed to go even near that direction. 

See the svenson wiki at google code for an explanation of how svenson works.

The other project is called jcouchdb and is my attempt at writing a Java driver for couchdb. It is very much connected to svenson as it was the driving force for the parser part of svenson. it offers an API that lets you use all those nice svenson features with couchdb documents. 

Links:

update:  link to couchdb

Annotating DOM nodes with JSON

One problem when trying to do “The Right Thing ™” and separate your webdesign into different layers is what to do when you need additional information to transform your nicely id and class annotated DOM nodes into fancy, javascript-enabled goodness:

Where do you store that information? How do you associate it to the DOM nodes? I will present three approaches to this problem..


Method 1: The squeaky clean

The first method is relatively straight-forward: You add ids or classes to your DOM nodes. Then you include a dynamically generated javascript library in your document.

The problem with this is that the generated javascript library is requested in a new request which makes it nescessary to provide this request with the nescessary context to generate it. You also need to write code which finds your DOM nodes and finds the data from the included javascript library and associates both. You also walk over the server side data twice: Once to generate the DOM nodes and once to create the javascript library.

Method 2: Compromising

The second approach is to compromise a little on the separation ideal and render the javascript data from the first method right into the page source.

You don’t need to carry the context for the data into a second request for the javascript library, but you still need to wak the data twice on the server side, you still need to associate the DOM nodes to the data, and you have a big block of js data in the head section of your document.

Method 3: Annotating DOM nodes with JSON

I thought about a way to directly associate the DOM with additional data. JSON seemed to be a good format to store the data. At first I thought about using a custom attribute but I did not like the idea of invalid HTML markup. Then I got another idea:

Why not use the event handlers? Something like

<a href="/no_js" onclick="return { foo: 'extra', bar: 1};">Link</a>

works pretty well. It is totally valid HTML, the onclick contains valid javascript code which can also be easily parsed by other tools. (It is basically just a JSON string wrapped with a “return [...] ;” ) When javascript is disabled, the link just executes normally and can provide non-javascript functionality. The data can be retrieved by executing “var data=linkNode.onclick();”. “But what about the fake event handler?” you might ask. Well.. if someone clicks on the unmodified link in a javascript enabled browser: nothing happens. The script just returns some data and does nothing else. The return code will be ignored by the browser environment due to this age-old, pre-DOM standard of canceling the event if the handler returns false and just going on if it returns true — since the data is something it evaluates to true so it’s just ignored.

On the server side things get much easier. Not only does no context need to be carried into another request but you also only need to walk over the data once. You can ouput the HTML and the additional javascript data into the same part of the document.

I admit that this approach bends the rules of separation a little, but in my opinion that’s ok.

  • It uses onclick=”” but does not really put any code in there, just some data
  • The semantics of the original markup are kept as they are. The method only allows to annotate this “base data” with additional information.
  • The pros vastly outweigh the cons

© 2014 fforw.de

Theme by Anders NorenUp ↑