Error Running Spark Job from Django API using subprocess.Popen

I have created a Django project executable, and I need to run a Spark job from an API endpoint within this executable. I am using subprocess.Popen to execute the spark-submit command, but I am encountering an error when the command is executed.

Here’s the command I am trying to run:

/opt/spark-3.5.5-bin-hadoop3/bin/spark-submit --master local --deploy-mode client --conf "spark.ui.enabled=false" --conf "spark.ui.showConsoleProgress=false" --conf "spark.dynamicAllocation.enabled=false" --conf "spark.rdd.compress=false" --conf "spark.driver.memory=4g" --conf "spark.executor.memory=8g" --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" /Users/user1/Project/process-engine/route.py > /app/server/EAP/rasLite/engine/raslight/spark_submit_log/20250424/772_1.0_174702842893.log 2>&1 "{'processNo': '772', 'versionNo': '1.0', 'jsonData': '', 'executionDate': '', 'skipError': 'N', 'generated_executionid': '149897', 'isExecutionIdGenerated': 'True', 'executionId': '149897', 'isPreProcess': 'False'}" &

However, I am getting the following error in the logs:

Unknown command: 'C:/Users/user1/Project/process-engine/route.py'
Type 'ras.exe help' for usage.

Context:

  • I am running this command from a Django API endpoint within a Django project executable.
  • The path to route.py seems to be correct, but the error message indicates a Windows-style path (C:/Users/...) instead of the Unix-style path I am using (/Users/...).
  • I am using the following code to execute the command:
command = f'/Users/user1/Project/process-engine/spark-3.5.5-bin-hadoop3/bin/spark-submit --master local --deploy-mode client --conf "spark.ui.enabled=false" --conf "spark.ui.showConsoleProgress=false" --conf "spark.dynamicAllocation.enabled=false" --conf "spark.rdd.compress=false" --conf "spark.driver.memory=4g" --conf "spark.executor.memory=8g" --conf "spark.serializer=org.apache.spark.serializer.KryoSerializer" /Users/user1/Project/process-engine/route.py > {finalFilePath} 2>&1'

final_command = command + " " + '\"' + raw_body_decoded1 + '\"' + " " + "&"

# Set the environment variables for the subprocess
env = os.environ.copy()
env['DJANGO_SETTINGS_MODULE'] = 'rasLightEngine.settings'
env['SPARK_HOME'] = 'C:/Users/user1/Project/process-engine/spark-3.5.5-bin-hadoop3'

subprocess.Popen(f'{final_command}', shell=True, env=env)

I have try setting up the environment variable for SPARK_HOME but it still showing the same error.

It seems like your django code has its own logic that is being misinterpreted. Try the following

# 1. Make sure paths are consistent with your OS
# For Windows:
spark_home = 'C:/Users/user1/Project/process-engine/spark-3.5.5-bin-hadoop3'
route_path = 'C:/Users/user1/Project/process-engine/route.py'

# For Unix/Linux/Mac:
# spark_home = '/Users/user1/Project/process-engine/spark-3.5.5-bin-hadoop3'
# route_path = '/Users/user1/Project/process-engine/route.py'

# 2. Use os.path.join for paths to ensure compatibility
spark_submit = os.path.join(spark_home, 'bin', 'spark-submit')

# 3. Build the command with proper quoting
command = [
    spark_submit,
    "--master", "local",
    "--deploy-mode", "client",
    "--conf", "spark.ui.enabled=false",
    "--conf", "spark.ui.showConsoleProgress=false",
    "--conf", "spark.dynamicAllocation.enabled=false",
    "--conf", "spark.rdd.compress=false",
    "--conf", "spark.driver.memory=4g",
    "--conf", "spark.executor.memory=8g",
    "--conf", "spark.serializer=org.apache.spark.serializer.KryoSerializer",
    route_path
]

# 4. Set up environment
env = os.environ.copy()
env['DJANGO_SETTINGS_MODULE'] = 'rasLightEngine.settings'
env['SPARK_HOME'] = spark_home

# 5. Execute the command - using args as a list avoids shell parsing issues
process = subprocess.Popen(
    command + [raw_body_decoded1],
    stdout=open(finalFilePath, 'w'),
    stderr=subprocess.STDOUT,
    env=env
)
Вернуться на верх