How to Properly Decode Quoted-Printable .eml Files in Django to Avoid = Artifacts?

I'm working on a Django project where I need to handle .eml files. The email content often has Quoted-Printable encoding, which causes some characters to be incorrectly displayed. For example, certain characters appear as = signs or have = signs inserted within words.

Here’s a sample of what the decoded content looks like:

We request all our =ustomers - For any Remittance please make sure to stay in touch with =ur office Tel no : +97150 267 4240 Mr .Kalpesh. =o:p>
All the Payment =ill be accepted only to East VISON CONTAINER LINE a/c and NOT any of =he personal account.
Also please make =ure to reply the mails only to domain =vcline.com.
We wouldn't be =esponsible for any financial fraud in case bank account is not verified =ith us on whatsapp / WeChat.

As you can see, characters such as "w" and "R" are replaced with = or the equals sign appears in unexpected places.

What I've Tried:

  1. Standard Decoding of Quoted-Printable: I tried decoding the email content using Python's quopri library, but the problem persists.

  2. Email Package in Python: I used Python's email package to parse the .eml file and manually decoded the content based on the Content-Transfer-Encoding header, but the result still contains these artifacts.

import os
import quopri
from email import message_from_bytes
from django.conf import settings

def save_eml_file(self, eml_content):
    file_name = f"{self.date_received.strftime('%Y%m%d_%H%M%S')}_{self.id}.eml"
    file_path = os.path.join(settings.MEDIA_ROOT, "emails", file_name)
    os.makedirs(os.path.dirname(file_path), exist_ok=True)
    email_message = message_from_bytes(eml_content)
    decoded_content = ""

    if email_message.is_multipart():
        for part in email_message.walk():
            if part.get_content_type() in ["text/plain", "text/html"]:
                charset = part.get_content_charset() or "utf-8"
                content_transfer_encoding = part.get("Content-Transfer-Encoding", "").lower()
                payload = part.get_payload(decode=True) or b""
                if content_transfer_encoding == "quoted-printable":
                    payload = quopri.decodestring(payload)
                try:
                    decoded_content += payload.decode(charset, errors="replace")
                except Exception:
                    decoded_content += str(payload)
    else:
        charset = email_message.get_content_charset() or "utf-8"
        content_transfer_encoding = email_message.get("Content-Transfer-Encoding", "").lower()
        payload = email_message.get_payload(decode=True) or b""
        if content_transfer_encoding == "quoted-printable":
            payload = quopri.decodestring(payload)
        try:
            decoded_content = payload.decode(charset, errors="replace")
        except Exception:
            decoded_content = str(payload)

    with open(file_path, "w", encoding="utf-8") as file:
        file.write(decoded_content)

    self.eml_file_path = f"emails/{file_name}"
    self.save()

What I Need:

I want to properly decode the email content so that these = artifacts are removed, and the content displays correctly in both English and non-English (Persian) text. How can I adjust the decoding process to achieve this?

Additional Information:

  • Django Version: 4.2

  • Python Version: 3.12

  • Environment: Windows

I appreciate any suggestions or insights that could help resolve this issue.

From what I can see, the problem has nothing to do with Django, but that your example is not a valid quoted-printable encoding.

[...], may be represented by an "=" followed by a two digit hexadecimal representation of the octet's value. The digits of the hexadecimal alphabet, for this purpose, are "0123456789ABCDEF". [...]

Back to Top