Is it reasonable to use Cloud storage for async webhook processing on Cloud Run
I'm processing webhooks on Cloud Run (Django) that need async handling because processing takes 30+ seconds but the webhook provider times out at 30s.
Since Cloud Run is stateless and spins up per-request (no persistent background workers like Celery), I'm using this pattern:
# 1. Webhook endpoint
def receive_webhook(request):
blob_name = f"webhooks/{uuid.uuid4()}.json"
bucket.blob(blob_name).upload_from_string(json.dumps(request.data))
webhook = WebhookPayload.objects.create(gcs_path=blob_name)
create_cloud_task(payload_id=webhook.id)
return Response(status=200) # Fast response
And then our cloud task calls the following endpoint with the unique path to the cloud storage url passed from the original webhook endpoint:
def process_webhook(request):
webhook = WebhookPayload.objects.get(id=request.data['payload_id'])
payload = json.loads(bucket.blob(webhook.gcs_path).download_as_text())
process_data(payload) # 30+ seconds
bucket.blob(webhook.gcs_path).delete()
My main query points:
Is GCS + Cloud Tasks the right pattern for Cloud Run's model, or is storing JSON directly temporarily in a django model a better approach since Cloud Tasks handles the queueing?
Should I be using Pub/Sub instead? My understanding is that pubsub would be more appropriate for broadcasting to numerous subscribers, currently I only have the one django monolith.
Thanks for any advice that comes my way.
The combination of Cloud Tasks + storing the data in a database (like your WebhookPayload model) is a good pattern. Storing the data in GCS is unnecessary given your current implementation and adds complexity.
Cloud Tasks handles the queueing and retries [3]. It is a reliable, managed service designed for this exact purpose.
Storing data in your Django database is simpler than involving GCS for temporary storage of the payload data itself. Since you are already creating a
WebhookPayloadmodel instance, you can store the entire JSON payload within aJSONField(or similar text field) in that model [5]. This keeps all related data within your primary data store and eliminates the GCS layer.
Revised, simplified approach:
Webhook endpoint:
python
def receive_webhook(request): # Store data directly in the database webhook_payload_data = json.dumps(request.data) webhook = WebhookPayload.objects.create(payload=webhook_payload_data) # Pass the ID to the task create_cloud_task(payload_id=webhook.id) return Response(status=200)Processing endpoint:
python
def process_webhook(request): # Retrieve data from the database webhook = WebhookPayload.objects.get(id=request.data['payload_id']) payload = json.loads(webhook.payload) process_data(payload) # 30+ seconds # Clean up the database record after successful processing webhook.delete()
Should I be using Pub/Sub instead?
Cloud Tasks is better suited for this specific scenario than Pub/Sub because you have a direct "worker" endpoint relationship [1].
Cloud Tasks is designed for explicit point-to-point delivery of a payload to a specific service endpoint, with built-in features like automatic retries with exponential backoff and explicit control over the target URL [3]. This aligns perfectly with your need to ensure a specific webhook is processed exactly once by your Django monolith.
Pub/Sub is designed for event ingestion and broadcast to potentially many subscribers (a fan-out pattern) [4]. While it can be used for a single worker, you would need to manage subscription logic, acknowledgements, and dead-letter queues manually within your application logic, which Cloud Tasks handles as a core service feature.
Summary
Your initial pattern is correct, but can be simplified by using your Django model's database for temporary storage instead of GCS. Cloud Tasks is the ideal Google Cloud service for this type of asynchronous, guaranteed delivery background processing with Cloud Run [1].