Execute raw SQL before each select for specific model
I have a complicated report in Django, which is written as raw SQL, and than stored in the database as a database view.
I have a Django model (managed=False), tied to that database view, so that I can utilize the django ORM and django rest framework, on top of the database view. This practice is something that I have used on several projects, and works fine.
For one specific report, I found that I need to meddle with the postgres query planner, to get it running faster. Before each SELECT statement on that model (database view), to get the SQL to run faster, I need to do:
SET enable_nestloop TO off;
The query is quite complex, with aggregate functions, and joins on subqueries, so I gave up on trying to optimize it. Turning off the nestloop solves all my performance issues, and I am fine with that, as long as I turn it on back after the query. Here is an explanation of the problem and the solution. (What are the pitfalls of setting enable_nestloop to OFF)
At the moment, what I do, is wrap the Report query in a database trasaction, and set the enable_nestloop
only within that transaction, with SET LOCAL
.
with transaction.atomic(), connection.cursor() as cursor:
cursor.execute("SET LOCAL enable_nestloop = off;")
queryset = self.__build_queryset(session, request.query_params)
serializer = MySerializer(queryset, many=True)
response_data = serializer.data
return Response(response_data)
This works good, but I need to remember to do this logic everytime before I execute a SELECT statement on this specific model, otherwise it is painfully slow.
I am looking for a method that will allow me to execute the SET parameter logic before every SELECT query on this model, and afterwards SET the parameter back to the original value. Can I do this on the Model or Manager level?
Kind of.
You probably don't want to insert it in a manager's get_queryset()
directly, because as you probably know querysets are lazy and that would end up disabling nestloop some arbitrary amount of time before you are actually going to hit the database with the cursor. Which increases the chances that any concurrent or simultaneous queries are going suffer the consequences of nestloop being turned off despite you not actually accessing the query that would benefit from nestloop being turned off.
You could override certain manager methods, but that would be tedious to work with and you wouldn't be able to effectively use .filter()
(as that returns a lazy queryset, leading to the same problem as discussed above). If you only need to fetch single items, overloading the manager's .get()
would probably be sufficient though.
A more general (and IMO, more maintainable, more django-esque) approach is to override the queryset itself and defer the setting update until you are actually hitting the database. This also has the added bonus of being able to use this same solution for any given model you need to work on, just by modifying its manager similarly to below.
class NestloopQuerySet(QuerySet):
_nestloop_sql_enable_string = "SET enable_nestloop TO on;"
_nestloop_sql_disable_string = "SET enable_nestloop TO off;"
def _fetch_all(self):
with connection.cursor() as cursor:
try:
cursor.execute(self._nestloop_sql_disable_string)
super()._fetch_all()
finally:
cursor.execute(self._nestloop_sql_enable_string)
class NestloopModelManager(models.Manager):
def get_queryset(self):
return NestloopQuerySet(self.model, using=self._db)
QuerySet._fetch_all()
is at the heart of all database retrieval avenues, so overloading this method should be sufficient for probably all possible use-cases, including iteration.