Django - Optimize grouping

I have a model:

from django.db import models

class Product(models.Model):
    sku = models.IntegerField()
    plu = models.CharField()
    pack_type = models.ForeignKey(PackTypes, on_delete=models.CASCADE)

I need to group them into data structure:

{ 
    < plu_1 >: { 
        < sku_1 >: [ 
            < pack_type_id_1 >,
            < pack_type_id_2 >,
            ...
        ],
        < sku_2 >: [],
        ...
    },
    <plu_2>: {
        ...
    }
}

The code that does it:

def dict_with_list():
    return defaultdict(list)

result = defaultdict(dict_with_list)
products = Product.objects.values_list('sku', 'plu', 'pack_type_id')

for (plu, sku, pack_type_id) in products:
    result[plu][sku].append(pack_type_id)

The problem with it is because there are a lot of records in model Product the code is slow (> 5 seconds).

How could I optimize the code to be faster?

You can let the database do the grouping:

from operator import itemgetter
from itertools import groupby

products = Product.objects.values_list('sku', 'plu', 'pack_type_id').order_by(
    'sku', 'plu'
)

items = {
    k1: {k2: list(map(itemgetter(2), v)) for k2, v in groupby(itemgetter(1), vs)}
    for k1, vs in groupby(itemgetter(0), products)
}

That being said, if the amount of data is really huge, you should use some form of pagination to retrieve the data.

Back to Top