Django WeasyPrint high memory usage with large datasets
I am using WeasyPrint in Django to generate a PDF. However, when processing around 11,000 records, it consumes all available resources allocated to the Kubernetes pod. As a result, the pod restarts, and I never receive the generated PDF via email.
Are there:
- Any lightweight PDF libraries that can handle generating PDFs for thousands of records more efficiently?
- Any optimization techniques in WeasyPrint (or in general) to reduce resource usage and generate the PDF successfully?
I faced the same issue with WeasyPrint consuming 8GB+ RAM for large PDFs. Here's what worked for me
Solution 1: Switched to ReportLab (This solved it)
I replaced WeasyPrint with ReportLab and memory usage dropped from 8GB to under 500MB:
python
from reportlab.platypus import SimpleDocTemplate, Table
import io
def generate_pdf(queryset):
buffer = io.BytesIO()
doc = SimpleDocTemplate(buffer)
elements = []
# This iterator prevents loading all records at once
for batch in queryset.iterator(chunk_size=1000):
data = [[record.field1, record.field2] for record in batch]
elements.append(Table(data))
doc.build(elements)
return buffer.getvalue()
Generated a 11,000-record PDF in 45 seconds using only 400MB RAM. The styling is more basic but it actually completes.
Solution 2: Chunk Processing (Works but slower)
When I had to stick with WeasyPrint for complex layouts, I processed in chunks:
python
from PyPDF2 import PdfMerger
import gc
def generate_large_pdf(queryset):
merger = PdfMerger()
# Process 500 records at a time
for i in range(0, queryset.count(), 500):
chunk = queryset[i:i+500]
html = render_to_string('template.html', {'records': chunk})
pdf = HTML(string=html).write_pdf()
merger.append(io.BytesIO(pdf))
# This is crucial - forces garbage collection
del html, pdf
gc.collect()
output = io.BytesIO()
merger.write(output)
return output.getvalue()
This kept memory under 1GB but took 3-4 minutes for 11,000 records.
What didn't work:
Increasing pod memory to 16GB - just delayed the crash
Using
--optimize-size
flag in WeasyPrint - minimal improvementTrying wkhtmltopdf - same memory issues
My recommendation: Use ReportLab for data-heavy PDFs, keep WeasyPrint only for complex design requirements with smaller datasets (<1000 records).