Duplicate instances not removed from queryset

I'm coming across a problem with the queryset returned in the get_tag_posts method within the Profile model. With the query as it is written, it won't remove duplicates even though .distinct() is being invoked.

To provide a brief illustration of the problem:

  • Question 1 - Tag A
  • Question 2 - Tag A
  • Question 3 - Tag A, Tag B

As get_tag_posts is invoked it includes the following queryset: <QuerySet [<Tag: A>, <Tag: A>, <Tag: A>, <Tag: B>]>

Yet I'm expecting the queryset to be returned as: <QuerySet [<Tag: A>, <Tag: B>]>

From there each instance is annotated to calculate the number of times a given tag has been posted by a user. Why are duplicates of a single instance not being removed in this case?

class Profile(Model):
    user = OneToOneField(settings.AUTH_USER_MODEL, on_delete=CASCADE)

    def get_tag_posts(self, order_by=None):
        if not order_by:
            order_by = "-question__date"
        elif order_by == "name":
            pass
        else:
            order_by = "-score"

        questions_with_tag = Subquery(self.questions.filter(
            tags__name=OuterRef("name")).only('id'))
        tags = Tag.objects.filter(
            question__profile=self
        ).distinct().order_by(order_by)
        return {
            'records': tags.annotate(times_posted=Count(questions_with_tag)),
            'title': f"{tags.count()} Tags"
        }
class Tag(Model):    

    name = CharField(unique=True, max_length=25)



class Post(Model):

    body = TextField()
    date = DateTimeField(default=timezone.now)
    comment = ForeignKey('Comment', on_delete=CASCADE, null=True)
    profile = ForeignKey(
        'authors.Profile', on_delete=SET_NULL, null=True,
        related_name='%(class)ss',
        related_query_name="%(class)s"
    )
    vote = GenericRelation(
        'Vote', related_query_name="%(class)s"
    )
    score = IntegerField(default=0)


class Question(Post):

    title = CharField(
        max_length=80, unique_for_date="date",
        help_text="Concisely state the problem you're having",
        error_messages={
            "max_length": "The title of your question is too long"
        }
    )
    tags = ManyToManyField(
        'Tag', related_name="questions", related_query_name="question"
    )
    views = IntegerField(default=0)
    visible = BooleanField(default=True)
    objects = Manager()
    postings = QuestionSearchManager()
    searches = QueryStringSearchManager()

Many to Many Relations actually includes 3 tables. In your Example: Questions, Tags, Question_Tag_Relations

Question_Tag_Relations

PK Question_PK Tag_PK
1 1 1
2 2 1
3 3 1
4 3 2

When you grab distinct() you are grabbing Distinct PKs in this relation table, not the tags themselves.

So you need to get the tag_pk (idk the specific column name)
Some DBs support distinct by column, like: ~.distinct('tag_pk')
But if your DB doesn't support that you might have to go the: ~.values_list('tags').distinct() route.. I believe this is an extra query tho

This would be my guess: Tags.objects.filter(pk__in=self.questions.values_list('tags').distinct())

Back to Top