Kristo Vaher

inventas vitam iuvat excoluisse per artes

Skip to: Content | Sidebar | Footer

Implementing tree traversal architecture for OriginNode

9 May, 2010 (13:35) | OriginNode | By: Kristo Vaher

In my various old projects for one reason or another a very recursive algorithm was used to fetch data that was structured in a tree. In some cases multiple trees were stored within the same table, which caused problems when trying to implement faster and better algorithms when fetching data from the tree, adding new nodes to the tree or removing nodes from the tree in general.

In OridinNode the entire system assumes that every tree has a single parent node. But in OridinNode there are also situations where one tree can be a child of another trees node, which causes conflict for the traversal algorithms. To make traversal work every tree needs to have at least one element that is considered ‘parent’. But if a parent happens to be a node from another tree, then it became important to implement a ‘ghost node’ to every module that has a parent in another tree. The user never sees such a ghost node, but that ghost node is used to traversing within the tree when fetching data to the actual parent element. To give an example, a website has a tree of pages. One of those pages however displays categories for products. There is no ‘main category’ as all of the categories are children of the page node from pages tree, but the categories table does need a ghost parent for quick traversal purposes.

The tree structure model used in OriginNode is a Nested Set Model. Basically every node within the tree is given two values, left and right, which carries their position information within the tree as well as basic information on their children and parents. This allows to completely throw out the recursive set of algorithms that are commonly used and makes it possible to build a whole array of the entire structure with just a few queries. For more detailed description please read through the link I posted above, I will be covering how this will be implemented within the structure of OriginNode below.

In OriginNode, the values often associated with recursive methods will still remain, this means that the nodes in OriginNode will also have information stored on direct ID values of their parents as well as their module identifiers. While those values will not be used to fetch trees, they are still used for other purposes, especially when connecting a number of nodes with one another. Instead of storing this information within the same tables, all this information is stored in a separate table for fetching data.

I have posted the basic table setup of OriginNode before, however the new and (yet again rewritten!) method has changed the structure quite a bit and it requires a revision here. This can be helpful for some of you readers out there who might face a similar challenge in setting up your infosystem architecture for your next project, so I will be giving a revised yet brief overview of the updated structure below.

Node Table
This table stores all nodes that share similar parameters. Every Node Table has a single parent node, but whether this parent is displayed for the user within the UI is up to the developer. This parent node within the table is automatically made invisible if the tree is assigned a parent from another tree. Fields within this node table are as follows:

  • id = Primary key of the field.
  • identifier = URL friendly string that is optional for the tree. When traversing the OriginNode architecture then both id’s and identifiers can be used to solving the path. For example, when one fetches data by using a string of ‘products/cars/112′, then OriginNode can find identifiers for products and cars, but not one for 112, in which case it will look for id of 112 instead from among the children of ‘cars’ node.
  • traversal_left = This is the left integer value for the tree traversal.
  • traversal_right = This is the right integer value for the tree traversal.
  • active = True or false flag whether the element is returned when data is being fetched from OriginNode.
  • period_start = Timestamp from what time the data is considered ‘active’ and thus returned when data is being fetched. This can be used for various purposes, such as pages that are ‘released’ at specific dates and periods. If this value is not set on the node, then the start time is ignored and data is always returned (unless restricted by the period_end field).
  • period_end = Timestamp until what time the data is considered ‘active’ and thus returned when data is being fetched. This can be used for various purposes, such as pages that are ‘closed’ after specific dates and periods. If this value is not set on the node, then the end time is ignored and data is always returned (unless restricted by the period_start field).
  • rights_read = This is a list of user group identifiers which have the right to see and read that node. Every group is a URL-friendly string and stored within square brackets, like [developer][admin], etc.
  • rights_edit = This is a list of user group identifiers which have the right to edit the data within that node. Every group is a URL-friendly string and stored within square brackets, like [developer][admin], etc.
  • rights_delete = This is a list of user group identifiers which have the right to remove that node. Every group is a URL-friendly string and stored within square brackets, like [developer][admin], etc.
  • f_* = Every custom field from Node Parameters Table (below) has its value stored in f_ prefix fields.

Node Parameters Table
This table stores all the custom fields that are stored within the Node Table. Every custom field is stored in Node Table with an f_ index.

  • identifier = String identifier that is also the field name, with a f_ prefix, stored in Node Table. This value is enforced to be unique.
  • type = this is a type keyword that determines what type of custom field it is. This ranges from simple fields such as checkboxes, radio buttons and dropdowns to WYSIWYG editors, color pickers, and connections with other nodes.
  • configuration = This is a serialized array, stored as a string. This carries whatever custom information the field type or its extension require, such as field length, allowed letters and other restrictions.
  • tab_identifier = This is the identifier string of the tab this field is displayed under when viewing the node of the Node Table. Fields with same tab identifiers are stored under the same tab. This value is always required or it will never be shown to the user.
  • fieldset_identifier = This is the identifier string of the fieldset this field is displayed under when viewing the node and the tab of the Node Table. Fields with same fieldset identifiers are stored under the same fieldset. This value can be null, which means that the field is not displayed within a fieldset.
  • weight = This is the weight of the field. Every node view will be built by taking into account the weights of the fields. First tab in node view will always be the one that has a field with the lowest weight. First fieldset under a tab will be the fieldset that has a field with the lowest weight. And fields with the same tab and fieldset identifiers will also be ordered based on the lowest weight first, and highest last. Basically the heavier a custom field is, the further back it is displayed.
  • table_display = Whether this custom field is also displayed in the Module view, in a tree or a list.
  • table_filter = Whether this custom field can be filtered in the Module view, in a tree or a list.
  • table_sortable = Whether this custom field can be sorted in a Module view, in a tree or a list.
  • rights_read = Whether these user groups have the rights to see and read the values of this field. Every group is a URL-friendly string and stored within square brackets, like [developer][admin], etc.
  • rights_edit = Whether these user groups have the rights to edit the values of this field. Every group is a URL-friendly string and stored within square brackets, like [developer][admin], etc.
  • value_required = Whether value within this field is required and cannot be null. This is not applicable to connections custom field types.
  • value_unique = Whether value within this field is required to be unique. This is not applicable to connections custom field types.
  • value_default = Default value stored within this field. This is not applicable to connections custom field types.
  • languages_dependant = Whether this field is languages dependant and has a new field for each of the languages. This means that in the case of – for example a page title – there will be three fields instead of one when viewing the node, one for each language. If this is set and the custom field is connections, then the connections will be only made with objects that are not language dependant or are of only the same language as the custom field.

Node Connections Table
This table has all the connections stored. In most cases this table is not used when fetching structure from the Node tables. All the connections however are still stored within this table and the traversal left and right values can be restored with the help of this table should something happen with the original tables. All subconnections (of the connections custom field type) are also stored in this table.

  • from_module = module identifier that the connection is from.
  • from_id = id of the node from the from_module that the connection is from.
  • to_module = module that this connection is created to.
  • to_id = id of the node that this connection is created to.
  • type = the type of this connection. This can be either [primary] (when the connection is part of the whole tree and deals with linked module nodes for traversal) or [secondary=field_name] (when the connection is not part of main structure and is only used to link another object, the field_name value identifies the custom field that this connection is used for).

In general this whole system is a little bit more complicated than the earlier versions of OriginNode concept, but the speed benefits here are rather impressive. Cutting down recursive searches will mean that even larger structures can be built with OriginNode with less problems when fetching large amounts of data. When this is paired with MySQL caching – which will be supported by OriginNode in ‘demand’ mode – then OriginNode architecture will be able to support even large scale infosystems of hundreds of thousands node objects.

Write a comment