Django ORM gives duplicates in filtered queryset

I have a django app. I use the ORM to run some queries. It appears I have some duplicates in my result.

While I can simply add a distinct() I would like to understand what is going on.

Here are my models:

class Person(models.Model):
    created = models.DateTimeField(auto_now_add=True)
    active_stuffs = models.ManyToManyField(Stuff, related_name="persons")
    waiting_stuffs = models.ManyToManyField(Stuff, related_name="persons_waiting")
    cancelled_stuffs = models.ManyToManyField(Stuff, related_name="persons_cancelled")
    # ... other fields

class Stuff(models.Model):
    name = models.CharField(null=False, blank=False, max_length=150,)
    # ... other fields

Here is the query:

queryset = Person.objects.filter(
    Q(active_stuffs__id=some_id)
    | Q(cancelled_stuffs__id=some_id)
    | Q(waiting_stuffs__id=some_id)
)

What I don't understand, is the following results:

  • queryset.count() -> 23
  • Person.objects.filter(Q(active_stuffs__id=some_id)).count() -> 16
  • Person.objects.filter(Q(cancelled_stuffs__id=some_id)).count() -> 0
  • Person.objects.filter(Q(waiting_stuffs__id=some_id)).count() -> 6

An instance of Stuff can only be in either active_stuffs, cancelled_stuffs or waiting_stuffs. I checked the Person instance that is duplicated, and the Stuff instance I'm looking for is only in the waiting_stuffs field... So, where could this duplicate come from?

The query makes LEFT OUTER JOINs, so:

SELECT person.*
FROM person
LEFT OUTER JOIN person_active_stuffs ON person_active_stuffs.person_id = person.id
LEFT OUTER JOIN person_waiting_stuffs ON person_waiting_stuffs.person_id = person.id
LEFT OUTER JOIN person_cancelled_stuffs ON person_cancelled_stuffs.person_id = person.id
WHERE person_active_stuffs.stuff_id IN some_ids
  OR person_waiting_stuffs.stuff_id IN some_ids
  OR person_cancelled_stuffs.stuff_id IN some_ids

so it makes three LEFT OUTER JOINs, and these can act as "multipliers" of each other.

If the same id occurs in active_stuffs and waiting_stuffs, it can be counted twice.

Вернуться на верх