Why we need sync_to_async in Django? [duplicate]
The document said:
The reason this is needed in Django is that many libraries, specifically database adapters, require that they are accessed in the same thread that they were created in. Also a lot of existing Django code assumes it all runs in the same thread, e.g. middleware adding things to a request for later use in views.
But another question Is it safe that when Two asyncio tasks access the same awaitable object? said python's asyncio is thread safe.
And as I know since the GIL still exist accessing one object from multiple thread should be thread safe.
Can any one give a minimal example for why we have to use await sync_to_async(foo)()
instead of directly foo()
in django or other async apps?
After the back and forth in the comments I feel I have a good idea of what kind of explanation you are after.
If you keep reading the documentation you quoted, in the immediate paragraph that follows, it wrote:
Rather than introduce potential compatibility issues with this code, we instead opted to add this mode so that all existing Django sync code runs in the same thread and thus is fully compatible with async mode. Note that sync code will always be in a different thread to any async code that is calling it, so you should avoid passing raw database handles or other thread-sensitive references around.
(Emphasis are mine in the above, and likewise in quotes below).
On top of ensuring that synchronous (blocking) function calls don't cause the main worker thread(s) to block as that would prevent everything else from being done (including other asynchronous IO that could be happening concurrently), the above explains the real reason why the documentation in Django insist that sync_to_async
must be used.
A point from your question need addressing:
... but why we need to run sync function in a new thread in async context?
Your assumption here is wrong, at least with how Django is set up when it's running within an async context. Along with the emphasis I put above and quoting from a couple paragraphs before the one you quoted:
If you use
asyncio.run()
or similar, it will fall back to running thread-sensitive functions in a single, shared thread, but this will not be the main thread.
Again, note the keywords here: "single, shared thread", not a new thread. As for "why", this is to ensure that legacy synchronous applications that may not be thread-safe, or things that can be used in a multithreaded environment but has invariants that can't be violated, such as SQLite, where it "can be safely used by multiple threads provided that no single database connection nor any object derived from database connection, such as a prepared statement, is used in two or more threads at the same time."
Another point from your question need addressing:
And as I know since the GIL still exist accessing one object from multiple thread should be thread safe.
Again, GIL might only protect pure Python code, but it does absolutely nothing about C libraries that Python uses, because the invariants as specified by SQLite cannot be upheld by the GIL.
To ensure safer usage of an SQLite connection, it's typical to use threadlocals to establish a connection, where that connection object wouldn't typically be easily used in a different thread. While in Django's ORM usage of it, there simply don't support async yet, quote:
We’re still working on async support for the ORM and other parts of Django. You can expect to see this in future releases. For now, you can use the
sync_to_async()
adapter to interact with the sync parts of Django. There is also a whole range of async-native Python libraries that you can integrate with.
I will presume Django uses threadlocals within the dedicated sync thread (I am not expending time to look this up; you can go figure this out) to ensure that the connection set up inside that one dedicated thread don't get copied/cloned out of there which will violate SQLite's thread-safety invariant. And so, to have access to the database connection or to do ORM in Django, the use of sync_to_async()
is a must.
Finally, the way Django has set this up has nothing to do with performance, but has everything to do with interoperability with non-sync and non-threadsafe code that is peppered within itself and elsewhere in Python's ecosystem.
So, if you want something "healthy" according to definitions given in your comment, Django shouldn't be your choice.