How Does Connection Pooling Work In Django?
If I'm not wrong, currently there are two ways to have connection pooling in Django:
- Native Connection Pooling (Django 5.x)
- Using PGBouncer
I want to know that how connection pooling works behind the scene in Django.
In FastAPI, there is one "permanent" process that handles all requests. We can have a connection pool using for example asyncpg
driver. In its simplest form, it creates like 10 connections to the Postgresql database(using 10 unix socket connections), then when a coroutine request a connection, it gives it from the pool.
+-------------------------+
| +------------+ | ---------------> +------------+
| | | | | |
| FastAPI | Connection | | ---------------> | Database |
| | Pool | | | |
| | (asyncpg) | | ---------------> +------------+
| +------------+ |
+-------------------------+
But in Django, there is no single permanent process. If we use Gunicorn with sync workers, every worker instantiates a Django application to handle one request at a time. There is no shared memory between them.
How can psycopg3 driver creates a connection pool in one worker and all other workers communicate with that?
I guess PGBouncer is a separate process that creates connection pool inside its memory, then all Django workers can communicate with that.
Django worker ---\
-----\ +------------+ ---------------> +------------+
-----\ | | | |
Django worker ------------------> | PBbouncer | ---------------> | Database |
-----/ | | | |
-----/ +------------+ ---------------> +------------+
Django worker ---/
Am I right to say that the both ways of connection pooling in Django are in-direct and there is an extra latency for the intermediate process?
Django does not natively support db pooling, you can look into tools like: https://github.com/jneight/django-db-geventpool
On the other hand, what Django support Persistent connections
with CONN_MAX_AGE
attribute.
I feel that this question arises out of lack of clarity on the various ways a Django application might be run. I'll try to list the relevant ways here:
- Synchronous worker processes.
- Synchronous worker processes with threading
- Asynchronous worker processes
In each of the above cases the workers are managed by a server like Gunicorn.
In the first case each worker processes one request at a time. In this case you might as well not use the connection pool and instead use persistent connections since only one connection from the connection pool can be used at a time.
In the second case one process can have multiple threads acting as workers in it. Each of these threads can serve a request at a time.
In the third case a process can server multiple requests at a time by utilizing the event loop and context switching while IO is happening.
The connection pool feature is suitable for the second and third cases. About the internals of how the connection pool works, Django uses connection pools from psycopg_pool, there's a worker thread that manages the connections so the working is similar to your diagram for FastAPI with asyncpg.