Intermittent 502/504 errors in Kubernetes (Traefik + FastAPI/Uvicorn), worse under load spikes

I have a FastAPI service running on Kubernetes behind Traefik ingress (Previously had Nginx but the issue is consistent).

Setup

FastAPI (Uvicorn, ~8 workers)
Kubernetes limits: 500m CPU / 1Gi memory
Traefik ingress controller (shared cluster-wide)
PostgreSQL backend (Django ORM usage in background workloads)
Readiness probe: /healthz (1s timeout)

What I tried

Increased replica count → no effect
Increased CPU/memory limits → no effect
Tuned Django DB connection settings (timeouts, pooling-related settings) → no effect
Disabled Uvicorn reload → no effect
Verified not much data in database, probably 5-10 rows
Observed baseline: small number of intermittent 5xx errors even without cron activity under continuous polling
Errors significantly increase immediately when periodic cron-driven DB activity starts/resumes
Enabled a “bye mode” to bypass DB writes/reads during spikes → 5xx still occurred when activity resumed
Removed database connection checks from /healthz endpoint → no effect

Observation

The system shows low but non-zero 5xx errors even under normal conditions, but error rate increases sharply and almost immediately under periodic background DB load patterns. The issue appears correlated with load spikes rather than a specific endpoint or query.

Вернуться на верх

Последние вопросы и ответы

Django Channels Tutorial - Redis timeout error

Jwt stale token revoke issue

How should DRF validate polymorphic per-provider JSON payloads before delegating to a service layer?

"session not created" error with Selenium ChromeDriver started by Django

Django logging with dictionaries as context

ShowFacets in django.contrib.admin models

AttributeError: 'PdbMiddleware' object has no attribute 'async_mode' [closed]

Can Django autocapitalize first letters like how Laravel does?

Не отображается путь до файла на форме django

Django CSRF port wildcard

Intermittent 502/504 errors in Kubernetes (Traefik + FastAPI/Uvicorn), worse under load spikes

What I tried

Observation

Последние вопросы и ответы

Рекомендуемые записи по теме