Connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL: sorry, too many clients already

I'm using celery with django to run some task that runs multiple threads querying the database. The example below is a simplified version of the original one in a larger project, but the logic and results are the same. The error occurs either way.

models.py

from django.db import models


class Example(models.Model):
    name = models.CharField(max_length=255)

urls.py

from django.urls import path

from core.views import ExampleView

urlpatterns = [path('example/', ExampleView.as_view())]

views.py

from core.tasks import run_task
from rest_framework.response import Response
from rest_framework.views import APIView


class ExampleView(APIView):
    def get(self, request, *args, **kwargs):
        run_task.delay()
        return Response(status=200)

tasks.py

from concurrent.futures import ThreadPoolExecutor, as_completed

from celery import shared_task

from core.models import Example


@shared_task
def run_task():
    with ThreadPoolExecutor(max_workers=100) as executor:
        futures = [
            executor.submit(Example.objects.get_or_create, name='example')
            for _ in range(200)
        ]
        for future in as_completed(futures):
            future.result()

To reproduce, just run celery:

celery -A example worker -l INFO

and django then run:

curl 'http://localhost:8000/api/example/'

You should be seeing the error below:

[2026-04-12 15:08:49,151: ERROR/ForkPoolWorker-15] Task core.tasks.run_task[1b1682e1-3b9f-43e2-b02f-ba878449d2a5] raised unexpected: OperationalError('connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:  sorry, too many clients already\n')
Traceback (most recent call last):
  File "/Users/users/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 279, in ensure_connection
    self.connect()
    ~~~~~~~~~~~~^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 256, in connect
    self.connection = self.get_new_connection(conn_params)
                      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/postgresql/base.py", line 333, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/Users/user/.local/lib/python3.14t/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:  sorry, too many clients already


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/user/.local/lib/python3.14t/site-packages/celery/app/trace.py", line 585, in trace_task
    R = retval = fun(*args, **kwargs)
                 ~~~^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/celery/app/trace.py", line 858, in __protected_call__
    return self.run(*args, **kwargs)
           ~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/user/Desktop/postgres-issue/backend/core/tasks.py", line 15, in run_task
    future.result()
    ~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.14t/concurrent/futures/_base.py", line 443, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/usr/local/lib/python3.14t/concurrent/futures/_base.py", line 395, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.14t/concurrent/futures/thread.py", line 86, in run
    result = ctx.run(self.task)
  File "/usr/local/lib/python3.14t/concurrent/futures/thread.py", line 73, in run
    return fn(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/query.py", line 987, in get_or_create
    return self.get(**kwargs), False
           ~~~~~~~~^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/query.py", line 635, in get
    num = len(clone)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/query.py", line 372, in __len__
    self._fetch_all()
    ~~~~~~~~~~~~~~~^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/query.py", line 2000, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
                         ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/query.py", line 95, in __iter__
    results = compiler.execute_sql(
        chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size
    )
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/models/sql/compiler.py", line 1622, in execute_sql
    cursor = self.connection.cursor()
  File "/Users/user/.local/lib/python3.14t/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 320, in cursor
    return self._cursor()
           ~~~~~~~~~~~~^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 296, in _cursor
    self.ensure_connection()
    ~~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 278, in ensure_connection
    with self.wrap_database_errors:
         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/utils.py", line 94, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 279, in ensure_connection
    self.connect()
    ~~~~~~~~~~~~^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/base/base.py", line 256, in connect
    self.connection = self.get_new_connection(conn_params)
                      ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/Users/user/.local/lib/python3.14t/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/Users/user/.local/lib/python3.14t/site-packages/django/db/backends/postgresql/base.py", line 333, in get_new_connection
    connection = self.Database.connect(**conn_params)
  File "/Users/user/.local/lib/python3.14t/site-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: connection to server on socket "/tmp/.s.PGSQL.5432" failed: FATAL:  sorry, too many clients already

I tried with different python versions: 3.14t, 3.14, postgres versions: 17, 18, django 5+, 6+, ... and the results are the same. Besides, lowering the amount of concurrency would prevent the error for the first run or few runs, then on the nth run, the error is hit again, so even if the below works, that doesn't mean the problem is solved.


@shared_task
def run_task():
    with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
        futures = [
            executor.submit(Example.objects.get_or_create, name='example')
            for _ in range(200)
        ]
        for future in as_completed(futures):
            future.result()

I'm not sure why these connections aren't being closed / cleaned up automatically, which is probably the main issue here. I even tried manually cleaning connections to no avail.

from django.db import close_old_connections, connections


@shared_task
def run_task():
    with ThreadPoolExecutor(max_workers=os.cpu_count()) as executor:
        futures = [
            executor.submit(Example.objects.get_or_create, name='example')
            for _ in range(200)
        ]
        for future in as_completed(futures):
            future.result()
    close_old_connections()
    for connection in connections.all():
        connection.close()

I'm running the above on my m4 max mbp Tahoe 26.4.1 + postgres 18 + python3.14t and the pip versions below:

celery                        5.6.3
channels                      4.3.2
channels_redis                4.3.0
daphne                        4.2.1
Django                        6.0.4
django-cors-headers           4.9.0
djangorestframework           3.17.1
djangorestframework_simplejwt 5.5.1
pip                           26.0.1
psycopg2                      2.9.11
redis                         7.4.0

Sorry, @user31749517. Have not yet managed to repro.

% echo $(jot 3)
1 2 3
%
%
% time bash -c 'for i in $(jot 1000); do curl -s "http://localhost:8000/api/example/"; done'
bash -c   1.94s user 2.41s system 33% cpu 12.966 total

In two Terminal tabs I have that "pound django a thousand times" running concurrently, several times in sequence, and each run takes thirteen seconds. When there's no concurrency and I do it from just a single tab's shell, fetching a zero-byte document a thousand times takes ~ nine seconds.

See, no env vars, nothing up my sleeve:

% env | grep DJANGO | wc -l
       0

Daphne log is very happy, strictly 200 Success documents being served. Here is a tiny excerpt:

::1:50906 - - [12/Apr/2026:17:26:42] "GET /api/example/" 200 -
::1:50908 - - [12/Apr/2026:17:26:42] "GET /api/example/" 200 -
::1:50910 - - [12/Apr/2026:17:26:43] "GET /api/example/" 200 -

In addition to interpreter 3.14.2, here's a pair of components which might differ between us:

% brew info postgres
==> postgresql@18 ✔: stable 18.3 (bottled) [keg-only]
Object-relational database system
% brew info valkey                               
==> valkey ✔: stable 9.0.3 (bottled), HEAD
High-performance data structure server that primarily serves key/value workloads
https://valkey.io

Aha! Reproduced it.

I restart celery, with no daphne django running, and I get exactly fifty success lines like this:

[2026-04-12 17:42:04,698: INFO/MainProcess] Task core.tasks.run_task[fe69d7fa-6684-48e4-8b3a-6aeb71fdb80d] received

and then it falls apart with the "no postgres" symptom you reported:

    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: connection to server on socket "/tmp/.s.PGSQL.5432" failed: Connection refused
    Is the server running locally and accepting connections on that socket?

Furthermore it doesn't shutdown gracefully, so I CTRL/C a few times:

    Is the server running locally and accepting connections on that socket?

^C
worker: Hitting Ctrl+C again will terminate all running tasks!
^C
Waiting gracefully for cold shutdown to complete...

worker: Cold shutdown (MainProcess)
^C[2026-04-12 17:43:59,655: WARNING/MainProcess] Restoring 40 unacknowledged message(s)
%
%
% jobs
%

Sometimes it comments on it having killed a worker process:

    Is the server running locally and accepting connections on that socket?

^C
worker: Hitting Ctrl+C again will terminate all running tasks!
^C
Waiting gracefully for cold shutdown to complete...

worker: Cold shutdown (MainProcess)
^C[2026-04-12 18:04:20,666: ERROR/MainProcess] Process 'ForkPoolWorker-7' pid:14143 exited with 'signal 15 (SIGTERM)'
[2026-04-12 18:04:21,710: WARNING/MainProcess] Restoring 38 unacknowledged message(s)
%
%

I am debugging with this task code:

@shared_task
def run_task():
    with ThreadPoolExecutor(max_workers=100) as executor:
        futures = [
            executor.submit(Example.objects.get_or_create, name="example")
            for _ in range(200)
        ]
        print(f"{len(futures)=}")
        for future in as_completed(futures):
            print(f"Asking for .result() from {future}")
            print(f"{future.result()=}")
        ...

We never see that final .result() print any output:

[2026-04-12 18:09:02,302: INFO/MainProcess] Task core.tasks.run_task[4f6d3268-0dc9-4b08-b66a-a8d15bbf4e83] received
[2026-04-12 18:09:02,302: INFO/MainProcess] Task core.tasks.run_task[191fed8e-89dc-43f4-aec5-53a68b6ec4df] received
[2026-04-12 18:09:02,303: INFO/MainProcess] Task core.tasks.run_task[3840e8ea-4ded-4d78-b820-069c4e3ce9cc] received
[2026-04-12 18:09:02,304: INFO/MainProcess] Task core.tasks.run_task[364206f7-710b-45ba-995f-2fa15bffd7f0] received
[2026-04-12 18:09:02,304: INFO/MainProcess] Task core.tasks.run_task[06866ed5-defb-4c1a-965d-5ddaeb184fa3] received
[2026-04-12 18:09:02,304: WARNING/ForkPoolWorker-8] len(futures)=200
[2026-04-12 18:09:02,305: WARNING/ForkPoolWorker-9] len(futures)=200
[2026-04-12 18:09:02,305: WARNING/ForkPoolWorker-8] Asking for .result() from <Future at 0x10c9a56d0 state=finished raised OperationalError>
[2026-04-12 18:09:02,305: INFO/MainProcess] Task core.tasks.run_task[5c2ef3ce-b47e-46bb-a35f-1fbad0135ead] received
[2026-04-12 18:09:02,306: INFO/MainProcess] Task core.tasks.run_task[a2ce76ef-5f75-40cf-8f27-c69c88111c20] received
[2026-04-12 18:09:02,306: WARNING/ForkPoolWorker-2] len(futures)=200
[2026-04-12 18:09:02,306: WARNING/ForkPoolWorker-7] len(futures)=200
[2026-04-12 18:09:02,306: WARNING/ForkPoolWorker-2] Asking for .result() from <Future at 0x10ca0cb50 state=finished raised OperationalError>
[2026-04-12 18:09:02,306: WARNING/ForkPoolWorker-7] Asking for .result() from <Future at 0x10c9a6ad0 state=finished raised OperationalError>
[2026-04-12 18:09:02,307: INFO/MainProcess] Task core.tasks.run_task[4c0adcdc-2844-4c5d-aaf4-9978242a5330] received

If, despite Celery being a mess, I forge ahead to start Daphne then the django webserver is happy and a web client obtains a zero-byte 200 Success document.

% curl -v http://localhost:8000/api/example/
* Host localhost:8000 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:8000...
* Connected to localhost (::1) port 8000
> GET /api/example/ HTTP/1.1
> Host: localhost:8000
> User-Agent: curl/8.7.1
> Accept: */*
> 
* Request completely sent off
< HTTP/1.1 200 OK
< Vary: Accept, Cookie
< Allow: GET, HEAD, OPTIONS
< X-Frame-Options: DENY
< Content-Length: 0
< X-Content-Type-Options: nosniff
< Referrer-Policy: same-origin
< Cross-Origin-Opener-Policy: same-origin
< Server: daphne
< 
* Connection #0 to host localhost left intact

Still looking into it.

(I will soon delete this post, as it is not an Answer to the OP question, merely an interim progress report which Comments would not accommodate.)

This code doesn't leak.

It also doesn't support DB concurrency, as there is no ThreadExecutor.

from celery import shared_task
from django.db import connections

from core.models import Example


@shared_task
def run_task(reps: int = 200):
    results = [
        Example.objects.get_or_create(name='example', id=1)[1] for _ in range(reps)
    ]
    assert len(results) == reps
    assert not any(results)

    for connection in connections.all():
        connection.close()

Verify that we don't add 100 "stuck" SELECT statements on each call with this psql query.

SELECT COUNT(*)  FROM pg_stat_activity  WHERE wait_event = 'ClientRead' AND state = 'idle';

BTW close_old_connections() is only for rather old ones, like more than two seconds old.

Вернуться на верх