How does one correctly define an index for efficiently querying the reverse relation for ForeignKey fields? (Django)

So I've spent days searching on and off for an answer to this question, and much to my surprise, I can't seem to find anbyone asking this specific question, nor any mentions of the confusion I have around it. My confusion lies, I suppose, in not being sure just how it is that Django builds queries under the hood, around querying a model_set reverse relation, or its related_name is specified as.

It would be ideal if beginning an ORM query with parent_instance.related_name worked exactly like RelatedModel.objects does, so that, for example, the query parent_instance.related_name.filter(token__in=somelist) was able to utilize a simple field index on the "token" field created with db_index=True, just like how such an index would be able to be utilized with RelatedModel.objects.filter(token__in=somelist). However, these two queries are not equivalent; to be equivalent, the second query would have to be changed to RelatedModel.objects.filter(parent=parent_instance, token__in=somelist) (or the reverse order, RelatedModel.objects.filter(token__in=somelist, parent=parent_instance)).

This leads me to believe that, in order to take advantage of indexing when querying Reverse foreignKey relations, the parent model likely n[eeds to be part of whatever indexes you plan to utilize. If one understands databases under the hood (which I do to an extent, though it is not my strong suit, by far), this may very well be obvious, but many programmers do not in this day and age, and it's difficult for me to believe that people are all just realizing that the parent model needs to be included in indexes for related_set queries, considering that one does not explicitly indicate such to be true when querying using the form instance.related_set(...).

That's why I'm a bit shocked 'that it's been so hard fpr me to find an answer to this... not only that, but that I can't seem to even find discussion on the topic or the question simply being asked. I would try for longer to find the answer myself, but my company insists on me being a development team of just one, anfd they're breathing down my neck to get this thing updated with efficiency enhancements. And by the way, before anyone scolds me, I recently installed the django debug toolbar, but I'vemade rather sweeping changes and am trying to get it to the pouint where I am able to run the thing again locally (did I learn lessons about how to structure optimization changes into small, discrete branches and commits? Why yes, I did! Pain is a very effective teacher).

Anyway, it seems to me that Djago may have found some way to handle this under the hood so that querying the related_set reverse relation allows an index to be utilized in the same way it could be used when starying from RelatedModel.objects, just because, well, are people really needing to add additional compound indexes for every field they plan to query through the reverse relation, when their app contains querying both in total and throguh the reverse relation? Like, if you need to query a range and/or list of values on the primary key, and you need to query a field "token",and you sometimes need to do this without limiting the query to a particular "parent" model instance, and sometimes oly for a particular parent instance, is this how you would accomplish definition of effective indexes:

class ParentModel(models.Model):
    ...

    class Meta:
        indexes = [
            models.Index(fields=['related_set', 'id']),
            models.Index(fields=['related_set', 'token']),
        ]
    

class RelatedModel(models.Model):
    ... (allowing primary key to default to id, which is automatically indexed)
    token = models.CharField(..., db_index=True)
    parent = models.ForeignKey(ParentModel, related_name='related_set') # Using 'related_set' here for clarity of the example, I know it's redundant

    class Meta:
        indexes = [
            # Clearly these compound indexes would be needed for querying RelatedModel.objects, 
            # when it is required to narrow the search to one or more parent instances, so no confusion here
            models.Index(fields=['parent', 'token']),
            models.Index(fields=['parent', 'id']),
        ]

My question is around the indexes proposed above for the ParentModel. Would these be nevcessary to query in the form parent_instance.related_set.filter(...)? Or would the compound indexes defined in RelatedModel work for both this form as well as the form RelatedModel.objects.filter(parent=parent_instance, id=xxx/token=xxx)?

Or am I suffering from lak of sleep and overworking combined with wearing an iappropriate number of hats, and I'm failing to think about this the right way and missing something obvious? It can't be that simply having db_index=True on the 'token' field, and nothing else, would allow parent_instance.related_set.filter(token=value) to be able to utilize the index, can it?

EDIT: As I think about it, it would seem like it would be necessary for the indexes to be created in the RelatedModel modelclass definition, since it is, after all,those instances which are being queries/retrieved. But you are staryiong from a parent model instance when using related_set, so maybe not. IF indeed it will use indexes defined in the ParentModel, then it seems clear they'd need to have the leftmost field be 'related_set'.

BUT, if the indexes need to be defined in RelatedModel, then it seems to me that you would need to create compound indexes for ['parent', 'field'], with parent as the leftmost field. But then, what if you've queried ParentModel to find the instance to query? Like, if I have thw query ParentModel.filter(somefield__startswith='node_'), and I want to find all instances of RelatedModels associated with those instance(s), through the related_set reverse relation, that have token=somevalue, how would I build that index, and where?

Thanks for any insight!

Вернуться на верх