Filtering Django QuerySet after using a Window function

For a simple model:

from django.db.models import CharField, Model, PositiveIntegerField

class Example(Model):
    category = CharField(max_length=20, null=False, blank=False)
    version = PositiveIntegerField(null=False, blank=False)

    class Meta:
        unique_together = ["category", "version"]

And some sample data:

Example.objects.update_or_create(id=1, category="Thing 1", version=1)
Example.objects.update_or_create(id=2, category="Thing 1", version=2)
Example.objects.update_or_create(id=3, category="Thing 1", version=4)
Example.objects.update_or_create(id=4, category="Thing 2", version=1)
Example.objects.update_or_create(id=5, category="Thing 2", version=2)
Example.objects.update_or_create(id=6, category="Thing 3", version=3)

I would like to use window functions to get:

  • The examples that, only from the sub-set with the latest version in each category, have ids either 1, 4, 5 or 6.

I am trying to do this using the RowNumber window function:

from django.db.models import F, Window
from django.db.models.functions.window import RowNumber

results = Example.objects.alias(
    rn=Window(
        expression=RowNumber(),
        partition_by=[F("category")],
        order_by="-version"
    ),
).filter(rn=1).filter(id__in=[1,4,5,6])

print(results.query)
print(results)

This generates the query:

SELECT "col1",
       "col2",
       "col3"
FROM   (
  SELECT *
  FROM   (
    SELECT "greatest_n_example"."id" AS "col1",
           "greatest_n_example"."category" AS "col2",
           "greatest_n_example"."version" AS "col3",
           ROW_NUMBER() OVER (PARTITION BY "greatest_n_example"."category" ORDER BY "greatest_n_example"."version" DESC) AS "qual0"
    FROM   "greatest_n_example"
    WHERE  "greatest_n_example"."id" IN (1, 4, 5, 6)
  ) "qualify"
  WHERE  "qual0" = 1
) "qualify_mask"

And returns examples 1, 5 and 6; which is not what is required.

The latest for category="Thing 1" is version=4 and the filter on rn should be applied before the filter on id (it isn't, it is currently the other way round) so it is expected that all the Thing 1 category items should be excluded from the result set.

The expected query is equivalent to:

SELECT "col1",
       "col2",
       "col3"
FROM   (
  SELECT *
  FROM   (
    SELECT "greatest_n_example"."id" AS "col1",
           "greatest_n_example"."category" AS "col2",
           "greatest_n_example"."version" AS "col3",
           ROW_NUMBER() OVER (PARTITION BY "greatest_n_example"."category" ORDER BY "greatest_n_example"."version" DESC) AS "qual0"
    FROM   "greatest_n_example"
  ) "qualify"
  WHERE  "qual0" = 1
  AND    "col1" IN (1, 4, 5, 6)
) "qualify_mask"

Which would output just 5 and 6.

How, using window functions, can I enforce that the row-number filter needs to be happen, at least, within the same inline view of the SQL as the id filter is applied, if not before?

Note: I know I can solve this problem using Subquery, rather using than Window, to perform the filtering using correlated sub-queries; this question is specifically about how to use window functions and to get the WHERE filters in the correct inline-view of the SQL query.

You can solve it using Case and When:

from django.db.models import Case, F, When, Window
from django.db.models.functions.window import RowNumber
from greatest_n.models import Example

query = Example.objects.alias(
    rn=Window(
        expression=RowNumber(),
        partition_by=[F("category")],
        order_by="-version"
    ),
    first_id=Case(When(rn=1, then=F("id")), default=None),
).filter(first_id__in=[1,4,5,6])

print(query.query)
print(query)

Which generates the query:

SELECT "col1",
       "col2",
       "col3"
FROM   (
  SELECT *
  FROM   (
    SELECT "greatest_n_example"."id" AS "col1",
           "greatest_n_example"."category" AS "col2",
           "greatest_n_example"."version" AS "col3",
           CASE
           WHEN (ROW_NUMBER() OVER (
                   PARTITION BY "greatest_n_example"."category"
                   ORDER BY "greatest_n_example"."version" DESC
                 ) = 1)
           THEN "greatest_n_example"."id"
           ELSE NULL
           END AS "qual0"
    FROM   "greatest_n_example"
  ) "qualify"
  WHERE "qual0" IN (1, 4, 5, 6)
) "qualify_mask"

And only outputs Examples 5 and 6.

Вернуться на верх