Django + Celery + PySpark inside Docker raises SystemExit: 1 and NoSuchFileException when creating SparkSession

I'm running a Django application that uses Celery tasks and PySpark inside a Docker container. One of my Celery tasks calls a function that initializes a SparkSession using getOrCreate(). However, when this happens, the worker exits unexpectedly with a SystemExit: 1 and a NoSuchFileException.

Here is the relevant part of the stack trace:

SystemExit: 1
[INFO] Worker exiting (pid: 66009)
...
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform...
WARN DependencyUtils: Local jar /...antlr4-4.9.3.jar does not exist, skipping.
...
Exception in thread "main" java.nio.file.NoSuchFileException: /tmp/tmpagg4d47k/connection8081375827469483762.info
...
[ERROR] Worker (pid:66009) was sent SIGKILL! Perhaps out of memory?

how can i solve the problem

Two potential issues come to mind:

  1. Insufficient memory in the Docker container — PySpark can be quite memory-intensive. If the container runs out of memory, the process may be killed (as suggested by the SIGKILL message in your logs).

  2. Improperly mounted /tmp volume — PySpark creates temporary files during execution. If /tmp is not properly mounted or writable, you'll get NoSuchFileException.

To address both, you can configure your docker-compose.yml like this:

services:
  celery:
    mem_limit: 2g
    volumes:
      - /tmp:/tmp 

Let me know if you've already checked these!

Вернуться на верх