Performance issues when using Django's Min('id') in big table
I have a model Table
which gets a new entry every few seconds, so it's a quite big table.
class Table(models.Model):
id: int
pk: int
timestamp = models.PositiveBigIntegerField(db_index=True)
Now what i am trying to do is get the first entry for the last 24h. And for some reason it takes way to much time. Like we are talking about seconds. But if i try to get last i get it instantly. I've tried following queries:
from_ts = int(time.time()) - 24 * 60 * 60
first_entry = Table.objects.filter(timestamp__gte=from_ts).first()
first_entry = Table.objects.filter(timestamp__gte=from_ts).aggregate(Min('id'))
They both take a few seconds to complete but if I try .aggregate(Min('timestamp'))
it returns almost instantly. I tried .filter(timestamp__gte=from_ts)[0]
which is quick and does work but i know its undefined behaviour because the entries aren't sorted and in some edge cases it might not work. Then i tried to just get the whole query in python and find min
in python and it was almost instant.
first_entry = min(list(Table.objects.filter(timestamp__gte=from_ts)))
Can anyone explain what exactly is happening here and what is the cleanest/best solution?
Your querysets are filtering on the timestamp and also ordering on the ID. Although the first query might not seems like it does that but .first()
implicitly performs and order_by
on the ID if there is no order already specified.
Although you have individual indexes on the ID and the timestamp, most likely the problem is that the database doesn't consider any suitable for your query resulting in it being inefficient (I'm not much of a DB person so not the best person to advice on indexes).
To make the query efficient using the index that you do have currently you can order on the timestamp and get the first object according to that order (which should ideally also be the first ID if your table stores data that way):
first_entry = Table.objects.filter(timestamp__gte=from_ts).order_by("timestamp").first()
Just Add an index to timestamp and id:
class Table(models.Model):
timestamp = models.PositiveBigIntegerField(db_index=True)
class Meta:
indexes = [
models.Index(fields=['timestamp', 'id']),
]
Then use this query to get the first entry from the last 24 hours.
first_entry = Table.objects.filter(timestamp__gte=from_ts).order_by('id').first()
This will be much faster because the database can use the combined index to both filter and sort without scanning the entire table.