Django ORM querying nested many to many table efficiently?
SO lets say I am designing a db for cooking session models as below
from django.db import models
class Recipe(models.Model):
name = models.CharField(max_length=255)
def __str__(self):
return self.name
class Step(models.Model):
name = models.CharField(max_length=255)
recipes = models.ManyToManyField(Recipe, related_name='steps')
def __str__(self):
return self.name
class CookingSession(models.Model):
name = models.CharField(max_length=255)
steps = models.ManyToManyField(Step, related_name='cooking_sessions')
def __str__(self):
return self.name
How can I use minimal number of queries (preferably one) to get all steps for a certain cooking session where each step should have the corresponding recipes if any.
cooking_sessions = (
CookingSession.objects.annotate(
step_list=ArrayAgg(
models.F(
"steps__name",
),
distinct=True,
),
recipe_list=ArrayAgg(models.F("steps__recipes__name")),
)
)
This is how the data looks like
[
{
'id': 1,
'name': 'Italian Night',
'step_list': ['Preparation', 'Cooking', 'Serving'],
'recipe_list': ['Tomato Sauce', 'Pasta Dough', 'Spaghetti', 'Tomato Sauce', 'Garlic Bread']
},
...
]
I would like the data to be like
{
'id': 1,
'name': 'Italian Night',
'steps': [
{
'step_name': 'Preparation',
'recipes': ['Tomato Sauce', 'Pasta Dough']
},
{
'step_name': 'Cooking',
'recipes': ['Spaghetti', 'Tomato Sauce']
},
{
'step_name': 'Serving',
'recipes': ['Garlic Bread']
}
]
}
You can transform the result with the ArrayAgg
s [Django-doc] to:
from itertools import groupby
from operator import itemgetter
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import F
cooking_sessions = CookingSession.objects.annotate(
step_list=ArrayAgg('steps__name'),
recipe_list=ArrayAgg('steps__recipes__name'),
)
for cooking_session in cooking_sessions:
cooking_session.steps = [
{'step_name': name, 'recipes': [r for __, r in items]}
for name, items in groupby(
zip(cooking_session.step_list, cooking_session.recipe_list),
itemgetter(0),
)
]
But it is quite complicated, and prone to errors. For example we here assume PostgreSQL will return the steps__name
and steps__recipes__name
in the same order, which might eventually change.
I would advise to just prefetch the items with .prefetch_related(…)
[Django-doc], which will do this in two extra queries, but not per CookingSession
. So regardless of the number of CookingSession
s, Step
s and Recipe
s, we fetch in three queries with:
CookingSession.objects.prefetch_related('steps__recipes')