Faster table of content generation in Django
I am trying to make a table of content for my queryset
in Django like this:
def get_toc(self):
toc = {}
qs = self.get_queryset()
idx = set()
for q in qs:
idx.add(q.title[0])
idx = list(idx)
idx.sort()
for i in idx:
toc[i] = []
for q in qs:
if q.title[0] == i:
toc[i].append(q)
return toc
But it has time complexity O(n^2). Is there a better way to do it?
I believe this is equivalent to the above but should be much faster as it uses the database engine to do the sorting and de-duplication:
def get_toc(self):
return {
q.title[0]: q
for q in self.get_queryset().distinct("title").order_by('title')
}
Instead of iterating over all queries for every item in idx
, iterate through qs
as the outermost and only for
loop, adding each query to the appropriate bucket. Something like this:
def get_toc(self):
toc = {}
qs = self.get_queryset()
for q in qs:
t = q.title[0]
toc[t].setdefault(t, []).append(value)
return toc
The original solution wasn't quite as bad as O(n^2)
– it was O(nm)
where n
is the length of the queryset
and m
is the number of unique titles. In the worst case (where every entry is unique in title), this would devolve to O(n^2)
. The new solution is O(n)
.
This doesn't look like a table of contents, but a glossary, where you map the first character of a term to a list of terms.
We can work with .groupby(…)
[python-doc] here:
from itertools import groupby
result = {
k: list(vs)
for k, vs in groupby(
self.get_queryset().order_by('title'), lambda x: x.title[0]
)
}