Django - Optimize grouping
I have a model:
from django.db import models
class Product(models.Model):
sku = models.IntegerField()
plu = models.CharField()
pack_type = models.ForeignKey(PackTypes, on_delete=models.CASCADE)
I need to group them into data structure:
{
< plu_1 >: {
< sku_1 >: [
< pack_type_id_1 >,
< pack_type_id_2 >,
...
],
< sku_2 >: [],
...
},
<plu_2>: {
...
}
}
The code that does it:
def dict_with_list():
return defaultdict(list)
result = defaultdict(dict_with_list)
products = Product.objects.values_list('sku', 'plu', 'pack_type_id')
for (plu, sku, pack_type_id) in products:
result[plu][sku].append(pack_type_id)
The problem with it is because there are a lot of records in model Product the code is slow (> 5 seconds).
How could I optimize the code to be faster?
You can let the database do the grouping:
from operator import itemgetter
from itertools import groupby
products = Product.objects.values_list('sku', 'plu', 'pack_type_id').order_by(
'sku', 'plu'
)
items = {
k1: {k2: list(map(itemgetter(2), v)) for k2, v in groupby(itemgetter(1), vs)}
for k1, vs in groupby(itemgetter(0), products)
}
That being said, if the amount of data is really huge, you should use some form of pagination to retrieve the data.