Tag: JSON

GraphQL-injection with Automaton/DomainQL

Symbol picture: Injection needle with GraphQL-Logo

In this blog post I will try to explain how we organize data loading in Automaton/DomainQL and walk you through the general idea of it and how to adapt it for your own project.

Conceptually it provides an alternative to asynchronous loading / suspense on application startup.

Background

We are working on the relaunch of a German state-level government application. We have been doing paid Open-Source development for clients for quite some time now and are now doing it again for this project as requested by our client.

On the most general level, we use DomainQL to drive our GraphQL.

Symbolic DomainQL Explanation: Database + JOOQ + DomainQL is combined to GraphQL

The current workflow is that we start with a PostgreSQL database from which JOOQ generates Java classes, so called POJOs (Plain Old Java Objects, basically normal/anemic Java classes with getters and setters according to the old Java Bean standard).

Basic injection in DomainQL

DomainQL already contains a comparatively simple GraphQL injection method based on exports with magic names. At point I’m close to considering it deprecated since the new system works so much better now.

At first I expected DomainQL to be our main thing, but along the way it became clear to me that we needed more and we added Automaton as an additional layer. Where DomainQL is still very general, Automaton has a clear and distinct vision and a very opinionated way of getting there.

GraphQL injection in Automaton

Let’s start with the basic idea of our kind of data injection. In a traditional Javascript application, the user loads the initial HTML document which contains very little, it downloads all the Javascript bundles and CSS bundles and other resources and then the JavaScript is executed and let’s say React starts rendering an application.

For which it might need a lot of data which it loads asynchronously which necessitates loading indicators, which prompted solutions like Suspense etc.

Often, this is preferable to having the user wait longer but it introduces a lot of additional latency, even if we can manage to tame the visual artifacts with Suspense.

But wouldn’t it be awesome if the components didn’t have to request the data they need because it is already there? What if the server could provide all the data the components need by knowing what they will request before they do?

Static Code Analysis

When I started wondering about how to do this, I was already using the solution, a small, maybe odd library I had written earlier called babel-plugin-track-usage.

Code is just data and as we do it, we never actually use JavaScript as-is, but always transpile it, compress it etc.

babel-plugin-track-usage uses the Babel AST to detect and track statically analyzable calls within your code base.

At first we used it for i18n.

    [
        "track-usage",
        {
            "sourceRoot" : "src/main/js/",
            "trackedFunctions": {
                "i18n": {
                    "module": "./service/i18n",
                    "fn": "",
                    "varArgs": true
                }
            },
            "debug": false
        }
    ]
Code language: JSON / JSON with Comments (json)

Here we see the plugin configuration for an i18n service. We define that all our sources are under “src/main/js” (we’re using a Maven project layout) and then we define that we want to track the default export of a module “./service/i18n” which accepts variable arguments.

This lets us write expressions like “i18n("Hello {0}", user)” in our code. The function just looks up the first argument in a translation map that contains all the necessary translations because the babel-plugin-track-usage provided the server with all the invocations of our i18n function as long as all arguments are statically analyzable. With the varArgs setting it only considers the first argument and allows non-static analyzable varargs to follow.

In combination with a mini-Webpack-Plugin, babel-plugin-track-usage then exports JSON data for the server to process.

{
    "usages": {
        "./apps/shipping/processes/crud-test/composites/CRUDList": {
            "requires": [
                "@quinscape/automaton-js", 
                "mobx-react-lite"
            ],
            "calls": {
                "i18n": [
                    ["Foo List"], 
                    ["Action"], 
                    ["owner"]
                ]
            }
        },
        ...
    }
}Code language: JavaScript (javascript)

“usages” contains a map of relative module names as keys. The “calls” array contains a map of symbolic call names (Here our “i18n”) with a nested array containing an array per function invocation found.

“requires” contains an array with all modules required by that module. This can be used to follow all import dependencies and find all calls starting with a module as starting point.

After I extended the library a few times to support template literals, ES6 imports and also identifiers, it was ready to drive our data injection too.

Direct GraphQL Query injection

Schematic diagram illustrating the injection process explained below

Our apps in automaton correspond to Webpack entrypoints. Each of them contains a number of processes that make up the application functionality.

For each process there is a MobX class that defines the process scope containing all the data for the process.

Here is how our first version of GraphQL injection looked and still does, although we rarely use it like this

import { observable } from "mobx";
import { query } from "@quinscape/automaton-js"

export default class ExampleScope {
    @observable
    injected = query(
        // language=GraphQL
            `query iQueryFoo($config: QueryConfigInput!)
        {
            iQueryFoo(config: $config)
            {
                # ...
            }
        }`,
        {
            "config": {
                "pageSize": 5,
                "sortFields": ["name"]
            }
        }
    )
}Code language: JavaScript (javascript)

We important the “query” function from our library which is registered tracked. The function takes a simple template literal string with the query and a static JSON block with the default config as second argument.

The naming conventions for our file tell the server which process the statically analyzable query belongs to. We follow all imports etc.

When the user starts the application with one of its processes, the server can look up the query() calls it found and use that to fetch the data from GraphQL service.

This method works very well, even when we inject many queries. We don’t even need to bother to craft GraphQL documents with multiple queries. All injected queries will be executed together using an internal shortcut without using HTTP. (Hurray for GraphQL’s transport neutrality).

At the end we can then embed all the queried data in our initial HTML object as script tag with fantasy type.

Our system then can provide the process with a new instance of the process scope that magically contains all the injected data blocks and the React components involved can start rendering right away.

The main reason we switched to a different method is that we in fact have a lot of queries to execute in some processes which makes the process scope very lengthy

Indirect GraphQL injection

At first I couldn’t quite figure out an elegant solution until I realized that like so often, the solution to the problem is another layer of indirection.

What we did was to reify the query arguments into its own named thing and add a second tracked function.

We split the injection into multiple files.

First we have the query definition which we put into its own file.

./src/queries/Q_Foo.js
import { query } from "@quinscape/automaton-js"

export default query(
    // language=GraphQL
        `query iQueryFoo($config: QueryConfigInput!)
    {
        iQueryFoo(config: $config)
        {
            # ...
        }
    }`,
    {
        "config": {
            "pageSize": 5,
            "sortFields": ["name"]
        }
    }
)
Code language: JavaScript (javascript)

We have a standard folder where all our named query definitions are located (here shortened to ./src/queries/). Each definition is named after the file it is in, here Q_Foo.

Now to inject the current value of Q_Foo we just need to write

./src/processes/example/example.js
import { observable } from "mobx";
import { query, injection } from "@quinscape/automaton-js"
import Q_Foo from "../../queries/Q_Foo"

export default class ExampleScope {
    @observable
    injected = injection(Q_Foo)
}Code language: JavaScript (javascript)

I just love the way it works out. The Javascript syntax is just like one would normally write it, I can use my IDE to ctrl+click on the the name and lookup the definitions (or do quick lookups etc). Everything is neatly organized and the scope definitions are much clearer.

The plugin just generates a special JSON block for our identifier

{
    "usages": {
        "./src/processes/example/example": {
            "requires": [
                "mobx",
                "@quinscape/automaton-js",
                "./src/queries/Q_Foo"
            ],
            "calls": {
                "injection": [
                    [
                        {
                            "__identifier": "Q_Foo"
                        }
                    ]
                ]
            }
        }
    }
}
Code language: JSON / JSON with Comments (json)

The JSON data just provides the name and the server has to know / define what that means, in our case “Hey there should be a ‘./src/queries/Q_Foo’ which in turn contains a query() invocation that defines what data we want here.”

Links

Generating List<Something> from JSON with svenson

Often you’ll find yourself wanting to parse a JSON into a Java collection but want the values inside the collection to be of a specific type. Nothing easier than that.

import org.svenson.JSONParser;
…
// Getting a list containing your own type Something.
// Assume json to be a String containing the JSON dataset.
JSONParser parser = new JSONParser();
parser.addTypeHint("[]", Something.class);
List<Something> someThings = parser.parse(List.class, json);
// someThings will be a ArrayList instance by default. You can change
// that by changing the mappings for interfaces by calling
// org.svenson.JSONParser.setInterfaceMappings(Map<Class, Class>)

Parsing into a map is not much more complicated either

JSONParser jsonParser = new JSONParser();
jsonParser.addTypeHint(new RegExPathMatcher("\\.(f1|f2)"), Something.class);
Map<String,Object> someThings = jsonParser.parse(Map.class, json);

If we want to have our Something type for more than a single field, we need to setup a matcher. Here you see an example of a RegExPathMatcher that makes sure that both the keys “f1” and “f2” of the map we receive will be converted to Something, while all other fields are not.

If you want to convert all map properties to Something, the RegExPathMatcher would be like this

    … new RegExPathMatcher("\\..*") …

This would match every JSON path that starts with a property. If you don’t like RegularExpressions, or are on some kind of diet on them, you can also construct a more complex matcher tree from the compositable Matchers like this

JSONParser jsonParser = new JSONParser();
jsonParser.addTypeHint(new OrMatcher(
    new PrefixPathMatcher(".f1"),
    new PrefixPathMatcher(".f2")), Something.class);
Map<String,Object> someThings = jsonParser.parse(Map.class, json);

Update: Due to me fucking up both the Prefix-/Suffix- matchers as well as their tests, the last example will only really work with the current svenson trunk/future svenson 1.3.8


Annotating DOM nodes with JSON, Part 2

It’s been a while since I wrote Annotating DOM nodes with JSON and in retrospective I can say that I never really used the method described in a real life project. Now I’d like to present another method of decorating DOM nodes with JSON based on classes. This one I actually implemented in OpenSAGA to have arbitrary metadata from some of the OpenSAGA Widgets.

I didn’t really like the idea of misusing onclick for the purpose of meta-data and thought about a better way of doing it. Browsing the w3 HTML specs I came upon the fact that classes can be any character separated by spaces. So for use-cases where I only needed one meta-data value I used classes like

<div class="refId:id-1234">
    DIV content
</div>

A use-case specific prefix is used to mark a class as meta-data container containing the string after the prefix. The code to evaluate this in javascript is very easy

/**
 * Returns the class value with the given prefix using the giving separator
 * @param {DOMElement} elem DOM element to fetch metadata from
 * @param {String} name of the classval value
 * @param {String} separator to use between name and value. Default is ":"
 */
function classval(elem, name, separator)
{
    var match = new RegExp("\\b" + name + (separator || ":") + "([^ ]*)($| )")
                          .exec(elem.className);
    if (match)
    {
        return match[1];
    }
    return null;
}
…
// assume divElement to be DOM element of the div
var refId = classval(divElement, "refId");

I thought about going for a more elaborate prefix scheme to support nested metadata but in the end decided against it because I already have a nicely supported format for exchanging data between server and client: JSON. So I tried to come up with a scheme of using arbitrary JSON for the metadata decoration.

Only problem: Spaces are not valid inside classes, so I needed a method to encode and decode JSON into valid classes. The method should not totally mangle the JSON to keep readability and maybe write the encoded variant by hand for simple cases.

Solution:

  • HTML encode the JSON-String
  • Replace spaces with underlines and underlines with \u005f

The replacement of underlines is valid because underlines can only occur inside quoted JSON strings so they can just be replaced by their escaped unicode value \u005f.

Here is the java code to do the escaping. Since it’s basically a combination of string replacement and HTML encoding this should be easily doable in any server-side language:

    public String escapeDecoration(String s)
    {
        String escaped = StringEscapeUtils.escapeHtml(s);

        StringBuilder sb = new StringBuilder(escaped.length());
        sb.append("deco:");
        for (int i = 0; i < escaped.length() ; i++)
        {
            char c = escaped.charAt(i);
            switch(c)
            {
                case '_':
                    sb.append("\\u005F");
                    break;
                case ' ':
                    sb.append('_');
                    break;
                default:
                    sb.append(c);
                    break;
            }
        }

        return sb.toString();
    }

The escape method uses the escapeHTML method from Apache commons-lang's StringEscapeUtil. Going the other way in javascript is not that complicated either:

/**
 * Decodes the given string containing HTML entities.
 */
function htmlDecode(s)
{
    var helper = document.createElement("SPAN");
    helper.innerHTML = s;
    return helper.innerHTML;
}

/**
 * Returns the JSON decoration of the given element.
 * @param {DOMElement} DOM element
 * @param {String} decorator classval name, default is "deco".
 */
function decoration(elem, name)
{
    var value, data, result;

    value = classval(elem, name || "deco");
    if (value)
    {
       // get raw data from DOM element
       data = value.replace(/_/g, " ");
       // replace HTML entities with the original characters
       data = htmlDecode(data);
       // evaluate JSON
       result = eval("("+data+")");
    }
    return result || {};
}

In order to achieve a better readability of escaped JSON, I also used svenson's ability to deviate from the JSON standard by using single quotes instead of double quotes. Just comparing

<div id="tst2" class="deco:{'foo':'xxx\u005f_yyy','baz':[1,3,5,7,9]}">
JSON annotation
</div>

to

<div id="tst2" class="deco:{&quot;foo&quot;:&quot;xxx\u005f_yyy&quot;,&quot;baz&quot;:[1,3,5,7,9]}">
JSON annotation
</div>

should demonstrate that single quotes are not only much better readable, but also shorter. If you use eval() evaluate the JSON string, the single quotes are no problem at all. If you want json2.js / native JSON-parsing, you might have to replace the quote chars before parsing.

Links:

HTML test page with both metadata strategies

Scripting JSON

Doing a lot of web stuff and fiddling around with CouchDB, I really got to like JSON as versatile format for things. Installing the JSONView extension for firefox really helps with working with JSON in the browser, but what I’ve been missing so far is an easy way to deal with JSON from bash scripts. Fiddling around with the very interesting NodeJS, I came up with a small node js script that makes JSON handling much easier, the JSON command: It reads a JSON object from stdin and feeds it to a javascript function body with “v” and NodeJS’ “sys” as parameters. The return value of the function is written to stdout. If it was a string, it is written as-is, if it is another object it will be pretty-JSONified.

Simple Example

$ curl -s http://localhost:5984/test | json "return v;"
{
 "db_name": "test",
 "doc_count": 0,
 "doc_del_count": 0,
 "update_seq": 0,
 "purge_seq": 0,
 "compact_running": false,
 "disk_size": 79,
 "instance_start_time": "1274021449672284",
 "disk_format_version": 5
}

Use curl to fetch the status of CouchDB database “test” from the local CouchDB node and then just pretty print it by returning the implicit value v.

$ curl -s http://localhost:5984/test | json "return v.disk_size;"
79

Just print the disk_size of CouchDB database “test”. You can use all the modern JavaScript functions v8 offers plus the implicit “sys” object that lets you log stuff to stderr or inspect objects. A little script that I find highly useful:

#!/bin/bash
# Delete all jcouchdb test databases
DBS=$(curl -s http://localhost:5984/_all_dbs | \
json 'return v.filter( function(db) { return db.indexOf("jcouchdb") == 0; }).join("\n");')

for i in $DBS
do
 curl -X DELETE http://localhost:5984/$i
done

Filter the list of database to only contain those that start with “jcouchdb”, then loop over them to delete.

Links:

Update: Added “return v;” as default function. now also supports “-h” and “–help”.

Memory consumption changes in svenson 1.3

Implementing a streaming attachment feature for jcouchdb, I started to wonder whether it would be a good idea for svenson to support JSON parsing from a stream, too, as I don’t really need the complete stream to start constructing the java object graph.

Implementing stream parsing was really nice and easy thanks to the units test present in svenson. After that, I came upon two ways to generally cut down on memory use. All tokens with fixed values could just have a single instance. The recording of tokens to provide token based look ahead was not really needed in all cases. But how much does that save?

As a test case I wrote a small tool class to generate random, nested JSON datasets, generated two test files of 65kb and 4.5mb size and parsed these with svenson 1.2.8 and what now is svenson 1.3.

Measuring the actual memory usage for these two test files proved to be difficult. Somehow none of the programs I tried seemed to give me the data I wanted. Eclipse TPTP just ignored Strings that were no member of any class but just parameters, making stream and string parsing look exactly the same memory-wise. tijmp and others did not provide the data I wanted at all.

So in the end I wrote a little python script that parses a hprof ASCII output to

  • sum up all memory use
  • group allocations by class type, but only if the stack trace of it touches svenson
  • output the top 10 of those classes and the sums

This provided meaningful data and also showed some points for further improvement. There was a huge number of java.lang.reflect.Method allocations which turned out to be caused by svenson inspecting the target classes for annotations and appropriate methods which was done on a per target basis instead of the better per target class basis.

All in all the memory usage went down quite a bit:

memory usage for different svenson versions, with and without streaming

memory usage for different svenson versions, with and without streaming

45% less memory for the small file and 62% for the large file for all allocations. I think that is really good..

Below are some links to the files needed to repeat the benchmarking. The transform hprof script might also prove to be useful for other projects if changed appropriately.

The new jcouchdb release will also use stream parsing.

Links:

edit:
The command to generate the hprof file was something like

java -agentlib:hprof=heap=sites,depth=100,cutoff=0 -cp .. svensonperf.ReadJSONOld big.json

Two new projects: svenson and jcouchdb

In my never ending fight against teh wind-mills, I have produced two new open source projects that are somehow connected to each other.

First there is svenson which is a release of my own personal JSON generator / parser. The name is a result of a joke. When my boss asked me what was the unique characteristic of it, my first answer was: “It’s written by me!”. So the name comes from “Sven’s JSON”. 

The answer was of course not totally serious. I wrote svenson because none of the JSON parsers out there seemed to have the right combination of being not too complicated yet flexible enough to work well in different scenarios. Being able to use a healthy mix of concrete typed java beans and dynamic map / list constructs seemed to be the best way to go, yet none of the JSON libraries out there seemed to go even near that direction. 

See the svenson wiki at google code for an explanation of how svenson works.

The other project is called jcouchdb and is my attempt at writing a Java driver for couchdb. It is very much connected to svenson as it was the driving force for the parser part of svenson. it offers an API that lets you use all those nice svenson features with couchdb documents. 

Links:

update:  link to couchdb

Annotating DOM nodes with JSON

One problem when trying to do “The Right Thing ™” and separate your webdesign into different layers is what to do when you need additional information to transform your nicely id and class annotated DOM nodes into fancy, javascript-enabled goodness:

Where do you store that information? How do you associate it to the DOM nodes? I will present three approaches to this problem..


Method 1: The squeaky clean

The first method is relatively straight-forward: You add ids or classes to your DOM nodes. Then you include a dynamically generated javascript library in your document.

The problem with this is that the generated javascript library is requested in a new request which makes it nescessary to provide this request with the nescessary context to generate it. You also need to write code which finds your DOM nodes and finds the data from the included javascript library and associates both. You also walk over the server side data twice: Once to generate the DOM nodes and once to create the javascript library.

Method 2: Compromising

The second approach is to compromise a little on the separation ideal and render the javascript data from the first method right into the page source.

You don’t need to carry the context for the data into a second request for the javascript library, but you still need to wak the data twice on the server side, you still need to associate the DOM nodes to the data, and you have a big block of js data in the head section of your document.

Method 3: Annotating DOM nodes with JSON

I thought about a way to directly associate the DOM with additional data. JSON seemed to be a good format to store the data. At first I thought about using a custom attribute but I did not like the idea of invalid HTML markup. Then I got another idea:

Why not use the event handlers? Something like

<a href="/no_js" onclick="return { foo: 'extra', bar: 1};">Link</a>

works pretty well. It is totally valid HTML, the onclick contains valid javascript code which can also be easily parsed by other tools. (It is basically just a JSON string wrapped with a “return […] ;” ) When javascript is disabled, the link just executes normally and can provide non-javascript functionality. The data can be retrieved by executing “var data=linkNode.onclick();”. “But what about the fake event handler?” you might ask. Well.. if someone clicks on the unmodified link in a javascript enabled browser: nothing happens. The script just returns some data and does nothing else. The return code will be ignored by the browser environment due to this age-old, pre-DOM standard of canceling the event if the handler returns false and just going on if it returns true — since the data is something it evaluates to true so it’s just ignored.

On the server side things get much easier. Not only does no context need to be carried into another request but you also only need to walk over the data once. You can ouput the HTML and the additional javascript data into the same part of the document.

I admit that this approach bends the rules of separation a little, but in my opinion that’s ok.

  • It uses onclick=”” but does not really put any code in there, just some data
  • The semantics of the original markup are kept as they are. The method only allows to annotate this “base data” with additional information.
  • The pros vastly outweigh the cons

© 2024 fforw.de

Theme by Anders NorénUp ↑