Filtering Django QuerySet after using a Window function
For a simple model:
from django.db.models import CharField, Model, PositiveIntegerField
class Example(Model):
category = CharField(max_length=20, null=False, blank=False)
version = PositiveIntegerField(null=False, blank=False)
class Meta:
unique_together = ["category", "version"]
And some sample data:
Example.objects.update_or_create(id=1, category="Thing 1", version=1)
Example.objects.update_or_create(id=2, category="Thing 1", version=2)
Example.objects.update_or_create(id=3, category="Thing 1", version=4)
Example.objects.update_or_create(id=4, category="Thing 2", version=1)
Example.objects.update_or_create(id=5, category="Thing 2", version=2)
Example.objects.update_or_create(id=6, category="Thing 3", version=3)
I would like to use window functions to get:
- The examples that, only from the sub-set with the latest version in each category, have ids either
1
,4
,5
or6
.
I am trying to do this using the RowNumber
window function:
from django.db.models import F, Window
from django.db.models.functions.window import RowNumber
results = Example.objects.alias(
rn=Window(
expression=RowNumber(),
partition_by=[F("category")],
order_by="-version"
),
).filter(rn=1).filter(id__in=[1,4,5,6])
print(results.query)
print(results)
This generates the query:
SELECT "col1",
"col2",
"col3"
FROM (
SELECT *
FROM (
SELECT "greatest_n_example"."id" AS "col1",
"greatest_n_example"."category" AS "col2",
"greatest_n_example"."version" AS "col3",
ROW_NUMBER() OVER (PARTITION BY "greatest_n_example"."category" ORDER BY "greatest_n_example"."version" DESC) AS "qual0"
FROM "greatest_n_example"
WHERE "greatest_n_example"."id" IN (1, 4, 5, 6)
) "qualify"
WHERE "qual0" = 1
) "qualify_mask"
And returns examples 1
, 5
and 6
; which is not what is required.
The latest for category="Thing 1"
is version=4
and the filter on rn
should be applied before the filter on id
(it isn't, it is currently the other way round) so it is expected that all the Thing 1
category items should be excluded from the result set.
The expected query is equivalent to:
SELECT "col1",
"col2",
"col3"
FROM (
SELECT *
FROM (
SELECT "greatest_n_example"."id" AS "col1",
"greatest_n_example"."category" AS "col2",
"greatest_n_example"."version" AS "col3",
ROW_NUMBER() OVER (PARTITION BY "greatest_n_example"."category" ORDER BY "greatest_n_example"."version" DESC) AS "qual0"
FROM "greatest_n_example"
) "qualify"
WHERE "qual0" = 1
AND "col1" IN (1, 4, 5, 6)
) "qualify_mask"
Which would output just 5
and 6
.
How, using window functions, can I enforce that the row-number filter needs to be happen, at least, within the same inline view of the SQL as the id
filter is applied, if not before?
Note: I know I can solve this problem using Subquery
, rather using than Window
, to perform the filtering using correlated sub-queries; this question is specifically about how to use window functions and to get the WHERE
filters in the correct inline-view of the SQL query.
You can solve it using Case
and When
:
from django.db.models import Case, F, When, Window
from django.db.models.functions.window import RowNumber
from greatest_n.models import Example
query = Example.objects.alias(
rn=Window(
expression=RowNumber(),
partition_by=[F("category")],
order_by="-version"
),
first_id=Case(When(rn=1, then=F("id")), default=None),
).filter(first_id__in=[1,4,5,6])
print(query.query)
print(query)
Which generates the query:
SELECT "col1",
"col2",
"col3"
FROM (
SELECT *
FROM (
SELECT "greatest_n_example"."id" AS "col1",
"greatest_n_example"."category" AS "col2",
"greatest_n_example"."version" AS "col3",
CASE
WHEN (ROW_NUMBER() OVER (
PARTITION BY "greatest_n_example"."category"
ORDER BY "greatest_n_example"."version" DESC
) = 1)
THEN "greatest_n_example"."id"
ELSE NULL
END AS "qual0"
FROM "greatest_n_example"
) "qualify"
WHERE "qual0" IN (1, 4, 5, 6)
) "qualify_mask"
And only outputs Examples 5
and 6
.