Django WeasyPrint high memory usage with large datasets

I am using WeasyPrint in Django to generate a PDF. However, when processing around 11,000 records, it consumes all available resources allocated to the Kubernetes pod. As a result, the pod restarts, and I never receive the generated PDF via email.

Are there:

  1. Any lightweight PDF libraries that can handle generating PDFs for thousands of records more efficiently?
  2. Any optimization techniques in WeasyPrint (or in general) to reduce resource usage and generate the PDF successfully?

I faced the same issue with WeasyPrint consuming 8GB+ RAM for large PDFs. Here's what worked for me

Solution 1: Switched to ReportLab (This solved it)

I replaced WeasyPrint with ReportLab and memory usage dropped from 8GB to under 500MB:

python

from reportlab.platypus import SimpleDocTemplate, Table
import io

def generate_pdf(queryset):
    buffer = io.BytesIO()
    doc = SimpleDocTemplate(buffer)
    
    elements = []
    # This iterator prevents loading all records at once
    for batch in queryset.iterator(chunk_size=1000):
        data = [[record.field1, record.field2] for record in batch]
        elements.append(Table(data))
    
    doc.build(elements)
    return buffer.getvalue()

Generated a 11,000-record PDF in 45 seconds using only 400MB RAM. The styling is more basic but it actually completes.

Solution 2: Chunk Processing (Works but slower)

When I had to stick with WeasyPrint for complex layouts, I processed in chunks:

python

from PyPDF2 import PdfMerger
import gc

def generate_large_pdf(queryset):
    merger = PdfMerger()
    
    # Process 500 records at a time
    for i in range(0, queryset.count(), 500):
        chunk = queryset[i:i+500]
        html = render_to_string('template.html', {'records': chunk})
        pdf = HTML(string=html).write_pdf()
        
        merger.append(io.BytesIO(pdf))
        
        # This is crucial - forces garbage collection
        del html, pdf
        gc.collect()
    
    output = io.BytesIO()
    merger.write(output)
    return output.getvalue()

This kept memory under 1GB but took 3-4 minutes for 11,000 records.

What didn't work:

  • Increasing pod memory to 16GB - just delayed the crash

  • Using --optimize-size flag in WeasyPrint - minimal improvement

  • Trying wkhtmltopdf - same memory issues

My recommendation: Use ReportLab for data-heavy PDFs, keep WeasyPrint only for complex design requirements with smaller datasets (<1000 records).

Back to Top