Duplicate instances not removed from queryset
I'm coming across a problem with the queryset returned in the get_tag_posts
method within the Profile
model. With the query as it is written, it won't remove duplicates even though .distinct()
is being invoked.
To provide a brief illustration of the problem:
- Question 1 - Tag A
- Question 2 - Tag A
- Question 3 - Tag A, Tag B
As get_tag_posts
is invoked it includes the following queryset:
<QuerySet [<Tag: A>, <Tag: A>, <Tag: A>, <Tag: B>]>
Yet I'm expecting the queryset to be returned as: <QuerySet [<Tag: A>, <Tag: B>]>
From there each instance is annotated to calculate the number of times a given tag has been posted by a user. Why are duplicates of a single instance not being removed in this case?
class Profile(Model):
user = OneToOneField(settings.AUTH_USER_MODEL, on_delete=CASCADE)
def get_tag_posts(self, order_by=None):
if not order_by:
order_by = "-question__date"
elif order_by == "name":
pass
else:
order_by = "-score"
questions_with_tag = Subquery(self.questions.filter(
tags__name=OuterRef("name")).only('id'))
tags = Tag.objects.filter(
question__profile=self
).distinct().order_by(order_by)
return {
'records': tags.annotate(times_posted=Count(questions_with_tag)),
'title': f"{tags.count()} Tags"
}
class Tag(Model):
name = CharField(unique=True, max_length=25)
class Post(Model):
body = TextField()
date = DateTimeField(default=timezone.now)
comment = ForeignKey('Comment', on_delete=CASCADE, null=True)
profile = ForeignKey(
'authors.Profile', on_delete=SET_NULL, null=True,
related_name='%(class)ss',
related_query_name="%(class)s"
)
vote = GenericRelation(
'Vote', related_query_name="%(class)s"
)
score = IntegerField(default=0)
class Question(Post):
title = CharField(
max_length=80, unique_for_date="date",
help_text="Concisely state the problem you're having",
error_messages={
"max_length": "The title of your question is too long"
}
)
tags = ManyToManyField(
'Tag', related_name="questions", related_query_name="question"
)
views = IntegerField(default=0)
visible = BooleanField(default=True)
objects = Manager()
postings = QuestionSearchManager()
searches = QueryStringSearchManager()
Many to Many Relations actually includes 3 tables. In your Example: Questions, Tags, Question_Tag_Relations
Question_Tag_Relations
PK | Question_PK | Tag_PK |
---|---|---|
1 | 1 | 1 |
2 | 2 | 1 |
3 | 3 | 1 |
4 | 3 | 2 |
When you grab distinct()
you are grabbing Distinct PKs in this relation table, not the tags themselves.
So you need to get the tag_pk (idk the specific column name)
Some DBs support distinct by column, like: ~.distinct('tag_pk')
But if your DB doesn't support that you might have to go the: ~.values_list('tags').distinct()
route.. I believe this is an extra query tho
This would be my guess: Tags.objects.filter(pk__in=self.questions.values_list('tags').distinct())