Home > Uncategorized > Storing hierarchical data in a database

Storing hierarchical data in a database

When looking for a good way to store nested comments in a SQL database I stumbled upon the explanation of an algorithm called “modified preorder tree traversal” algorithm [2]. Instead of storing a parent-child relationship into the database and having to do one SQL query for every node, it allows querying the whole comment tree (and subtrees) with a single query.


The basic idea is to walk along the outside of the tree hierarchy and to number the left and right side of every node along the way:

While this is rather unintuitive it allows to:

  • Get the sub tree of an node by selecting all nodes whose left value is between the left and the right value of the top node of the subtree
  • Get the path of a node by selecting all nodes whose left value is smaller than the left value of the node in question and whose right value is larger than its right value
  • Determine the number of children of a node by calculating:
    ( right value – left value -1 ) / 2

The disadvantage of this approach is that the addition of a single node leads to changes in all nodes to the “right” of it. After pondering the problem for a while I came up with a variant of the algorithm that gets rid of that. Instead of using simple consecutive numbers for the left and right values I wrote an algorithm which spreads all the left/right values over a fixed range of numbers. The top node starts with a fixed left and right value and children are inserted by splitting that interval into smaller sub-intervals. This makes it possible to not change the key values on inserting and do that insert with a single writing operation.

Links:

  1. Postgresql stored procedure implementing the keyspace variant
  2. Modified preorder tree traversal algorithm
    Another explanation of the modified tree traversal algorithm with code samples in PHP
  1. No comments yet.
  1. No trackbacks yet.