Django ORM efficient way for annotating True on first occurrences of field
I have a situation where we have some logic that sorts a patient_journey queryset (table) by some heuristics. A patient_journey has a FK to a patient.
I now need an efficient way of setting True for the first occurrence of a patient_journey
for a given patient
, false otherwise.
The first heuristinc is patient_id
so the queryset will already by grouped by patient.
The algorithm is v straight forward & should be fast, yet I'm stuck at trying to get something <1s.
I've tried using distinct
and checking for existence, however that's adding 1-2s.
I've tried using a subquery with [:1] & test on id, but that's even worse around 3-5s extra.
def annotate_primary(
*, qs: 'PatientJourneyQuerySet' # noqa
) -> 'PatientJourneyQuerySet': # noqa
"""
Constraints:
--------------
Exactly One non-global primary journey per patient
Annotations:
------------
is_primary_:Bool, is the primary_journey for a patient
"""
from patient_journey.models import PatientJourney
qs = get_sorted_for_primary_journey_qs(qs=qs)
# cost until now is around 0.17s
# using distinct on patient_id & checking if id is in there adds around 1.5-2s
# looking for something faster, i.e. this shoiuld be a straight forward scan.
qs = qs.annotate(
# True for first occurrence of a `patient_id` false otherwise
primary=
)
return qs
Assuming the models are like this
from django.db import models
class Patient(models.Model):
first_name = models.CharField(max_length=50)
last_name = models.CharField(max_length=50)
class PatientJourney(models.Model):
patient = models.ForeignKey(Patient, on_delete=models.CASCADE, related_name='journeys')
visit_date = models.DateTimeField()
diagnosis = models.TextField(blank=True, null=True)
treatment = models.TextField(blank=True, null=True)
Using Window funstions you can get only the first journeys
from django.db.models import Window, F
from django.db.models.functions import RowNumber
from .models import PatientJourney
# Annotate each journey with its row number partitioned by patient and ordered by visit_date in ascendin order
patient_journeys_with_row_number = PatientJourney.objects.annotate(
row_number=Window(
expression=RowNumber(),
partition_by=[F('patient')],
order_by=F('visit_date').asc()
)
).filter(row_number=1)
for journey in patient_journeys_with_row_number:
print(f"{journey.patient.first_name} {journey.patient.last_name} - First Journey on {journey.visit_date}")