Faster table of content generation in Django

I am trying to make a table of content for my queryset in Django like this:

    def get_toc(self):
        toc = {}
        qs = self.get_queryset()
        idx = set()
        for q in qs:
            idx.add(q.title[0])
        idx = list(idx)
        idx.sort()
        for i in idx:
            toc[i] = []
            for q in qs:
                if q.title[0] == i:
                    toc[i].append(q)
        return toc

But it has time complexity O(n^2). Is there a better way to do it?

I believe this is equivalent to the above but should be much faster as it uses the database engine to do the sorting and de-duplication:

def get_toc(self):
    return {
        q.title[0]: q
        for q in self.get_queryset().distinct("title").order_by('title')
    }

Instead of iterating over all queries for every item in idx, iterate through qs as the outermost and only for loop, adding each query to the appropriate bucket. Something like this:

def get_toc(self):
    toc = {}
    qs = self.get_queryset()

    for q in qs:
        t = q.title[0]
        toc[t].setdefault(t, []).append(value)

    return toc

The original solution wasn't quite as bad as O(n^2) – it was O(nm) where n is the length of the queryset and m is the number of unique titles. In the worst case (where every entry is unique in title), this would devolve to O(n^2). The new solution is O(n).

This doesn't look like a table of contents, but a glossary, where you map the first character of a term to a list of terms.

We can work with .groupby(…) [python-doc] here:

from itertools import groupby

result = {
    k: list(vs)
    for k, vs in groupby(
        self.get_queryset().order_by('title'), lambda x: x.title[0]
    )
}
Back to Top