When looking for a good way to store nested comments in a SQL database I stumbled upon the explanation of an algorithm called “modified preorder tree traversal” algorithm [2]. Instead of storing a parent-child relationship into the database and having to do one SQL query for every node, it allows querying the whole comment tree (and subtrees) with a single query.
The basic idea is to walk along the outside of the tree hierarchy and to number the left and right side of every node along the way:
While this is rather unintuitive it allows to:
- Get the sub tree of an node by selecting all nodes whose left value is between the left and the right value of the top node of the subtree
- Get the path of a node by selecting all nodes whose left value is smaller than the left value of the node in question and whose right value is larger than its right value
- Determine the number of children of a node by calculating:
( right value – left value -1 ) / 2
The disadvantage of this approach is that the addition of a single node leads to changes in all nodes to the “right” of it. After pondering the problem for a while I came up with a variant of the algorithm that gets rid of that. Instead of using simple consecutive numbers for the left and right values I wrote an algorithm which spreads all the left/right values over a fixed range of numbers. The top node starts with a fixed left and right value and children are inserted by splitting that interval into smaller sub-intervals. This makes it possible to not change the key values on inserting and do that insert with a single writing operation.
Links:
- Postgresql stored procedure implementing the keyspace variant
- Modified preorder tree traversal algorithm
Another explanation of the modified tree traversal algorithm with code samples in PHP
Leave a Reply