Celery - What pool should I use for windows heavy cpu process and redis backend for status tracking?

I am running python 3.9, Windows 10, celery 4.3, redis as the backend, and aws sqs as the broker (I wasn't intending on using the backend, but it became more and more apparent to me that due to the library's restrictions on windows that'd I'd be better off using it if I could get it to work, otherwise I would've just used redis as the broker and backend).

To give you some context, I have a webpage that a user interacts with to allow them to do a resource intensive task. If the user has a task running and decides to resend the task, I need it to kill the task, and use the new information sent by the user to create the new task.

The problem for me arrives after this line of thinking:

Me: "Hmmm, the prefork pool is used for heavy cpu background tasks... I want to use that..."
Me: Goes and configures,
  updates the celery library,
  sets the environment variable to allow windows to run prefork pool -
    os.environ.setdefault('FORKED_BY_MULTIPROCESSING', '1'),
  sets a few other configuration settings, etc,
  runs the worker and it works.
Me: "Hey, hey. It works... Oh, I still can't revoke a task DESPITE RUNNING THE PREFORK POOL!?!?!
Oh, that's okay... I can just set a session variable to let me know if the user already started a task,
and if they have, just have celery tell me if the task that they started is finished
before I allow the user to request to run a task again."

Me: Goes and configures django sessions,
configures redis,
updates the views to include the session variable, etc,

Me: "Great! Everything is working, so far..."

Me: Runs a test to see if the redis server returns the status...
Celery: "PENDING"
Me: "Yo! Is my task done, yet!?"
Celery: "No - PENDING"
Celery: "PENDING"
Celery: "PENDING"
Celery: "PENDING"
Celery: "PENDING"
Celery: "PENDING"

Me: Searches stackoverflow for why its only pending...
Me: Finds out that you must use --pool=solo for the worker...
Me: Dies on the inside.

Ideally - I'd like to be able to use the prefork pool to do intense processing and to kill the task if need be. The thing is that everything that I read tells me prefork is what I want, but solo is the only way I can think of to get it to work.


How bad is it for me to compromise these desires and just go with solo, expecting that I will be using heavy cpu for the tasks and many users? Assume 100s if not 1000s.

What other solutions should I consider?

Answers: 1

Answered by Reed Jones, Oct. 13, 2021, 8:24 a.m.

In my experience on windows I cannot use anything other than --pool=solo

What other solutions should I consider?

The way I do it is I use 1 pool for windows development and more on production (linux) at least in my case using solo pool for development is fine.