Django ORM querying nested many to many table efficiently?

SO lets say I am designing a db for cooking session models as below

from django.db import models

class Recipe(models.Model):
    name = models.CharField(max_length=255)
   
    def __str__(self):
        return self.name

class Step(models.Model):
    name = models.CharField(max_length=255)
    recipes = models.ManyToManyField(Recipe, related_name='steps')

    def __str__(self):
        return self.name

class CookingSession(models.Model):
    name = models.CharField(max_length=255)
    steps = models.ManyToManyField(Step, related_name='cooking_sessions')

    def __str__(self):
        return self.name

How can I use minimal number of queries (preferably one) to get all steps for a certain cooking session where each step should have the corresponding recipes if any.

cooking_sessions = (
            CookingSession.objects.annotate(
             
                step_list=ArrayAgg(
                    models.F(
                        "steps__name",
                    ),
                    distinct=True,
                ),
                recipe_list=ArrayAgg(models.F("steps__recipes__name")),
                
            )
           
        )

This is how the data looks like

[
    {
        'id': 1,
        'name': 'Italian Night',
        'step_list': ['Preparation', 'Cooking', 'Serving'],
        'recipe_list': ['Tomato Sauce', 'Pasta Dough', 'Spaghetti', 'Tomato Sauce', 'Garlic Bread']
    },
    ...
]

I would like the data to be like

{
    'id': 1,
    'name': 'Italian Night',
    'steps': [
        {
            'step_name': 'Preparation',
            'recipes': ['Tomato Sauce', 'Pasta Dough']
        },
        {
            'step_name': 'Cooking',
            'recipes': ['Spaghetti', 'Tomato Sauce']
        },
        {
            'step_name': 'Serving',
            'recipes': ['Garlic Bread']
        }
    ]
}

You can transform the result with the ArrayAggs [Django-doc] to:

from itertools import groupby
from operator import itemgetter

from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import F

cooking_sessions = CookingSession.objects.annotate(
    step_list=ArrayAgg('steps__name'),
    recipe_list=ArrayAgg('steps__recipes__name'),
)

for cooking_session in cooking_sessions:
    cooking_session.steps = [
        {'step_name': name, 'recipes': [r for __, r in items]}
        for name, items in groupby(
            zip(cooking_session.step_list, cooking_session.recipe_list),
            itemgetter(0),
        )
    ]

But it is quite complicated, and prone to errors. For example we here assume PostgreSQL will return the steps__name and steps__recipes__name in the same order, which might eventually change.

I would advise to just prefetch the items with .prefetch_related(…) [Django-doc], which will do this in two extra queries, but not per CookingSession. So regardless of the number of CookingSessions, Steps and Recipes, we fetch in three queries with:

CookingSession.objects.prefetch_related('steps__recipes')
Back to Top