Modeling Trees with SQLAlchemy ORM and Postgres Ltree

Background

When writing software, you’ll often encounter situations where a tree is the most appropriate data structure for working with hierarchical data. Although Python lacks a built-in native implementation of trees, it’s relatively straightforward to implement one yourself, especially with help from third-party libraries. In this post, I’ll walk through one approach to representing trees in Python using SQLAlchemy and the PostgreSQL Ltree data type.

Recall that a tree is made up of nodes that are connected by edges, with each node having one or zero (the root nodes) parent nodes, and zero (the leaf nodes) or more child nodes. As an example, here’s a tree showing the relationships between different categories of cats:

Unfortunately, trees can be an awkward fit for most traditional SQL databases. While relational databases are good at expressing the connections between different types of objects through foreign keys onto other tables, representing nested hierarchies of similar entities usually requires doing extra work and accepting some tradeoffs.

There are a variety of well-known approaches for storing trees in a relational database. Perhaps the most straightforward is the adjacency listpattern, where each row records one edge, represented by references to the parent and child nodes.The SQLAlchemy documentation contains an example of how to implement this pattern using its object-relational model (ORM). This method is simple and accommodates both inserting new nodes and updates that rearrange nodes and their subtrees. The tradeoff is that retrieving an entire subtree can be inefficient, involving expensive recursive queries.

Another common technique is using the materialized path pattern, in which each node keeps a record of the path to reach it from the root of the tree. This approach allows for fast inserts and fast queries, but moving an existing node to a different tree can be slow and expensive, as you have to rewrite the paths on any descendants of that node. Fortunately, there are many application workflows where moving nodes is rare or impossible, while adding new nodes and fetching entire subtrees are common operations. Imagine forum software that keeps track of nested trees of comments. Users can add new comments and delete old ones, but the application would never need to move or rearrange comments.

If you happen to be using Postgres as your database – you’re in luck! Postgres actually offers a custom data type called LTree specifically designed to record materialized paths for representing trees. Ltree is a powerful, flexible utility that allows your database to efficiently answer questions like, “What are all the descendants of this node?”, “What are all the siblings?”, “What’s the root of the tree containing this node?” and many more.

Setup

For this tutorial, you’ll need to install the following Python libraries: SQLAlchemySQLAlchemy-Utils, and the psycopg2 Postgres bindings. Your individual Python situation will vary, but I would suggest creating a virtualenv and installing the libraries there.

Back to Top