Is it reasonable to use Cloud storage for async webhook processing on Cloud Run

I'm processing webhooks on Cloud Run (Django) that need async handling because processing takes 30+ seconds but the webhook provider times out at 30s.

Since Cloud Run is stateless and spins up per-request (no persistent background workers like Celery), I'm using this pattern:

# 1. Webhook endpoint
def receive_webhook(request):
    blob_name = f"webhooks/{uuid.uuid4()}.json"
    bucket.blob(blob_name).upload_from_string(json.dumps(request.data))
    
    webhook = WebhookPayload.objects.create(gcs_path=blob_name)
    create_cloud_task(payload_id=webhook.id)
    
    return Response(status=200)  # Fast response

And then our cloud task calls the following endpoint with the unique path to the cloud storage url passed from the original webhook endpoint:

def process_webhook(request):
    webhook = WebhookPayload.objects.get(id=request.data['payload_id'])
    payload = json.loads(bucket.blob(webhook.gcs_path).download_as_text())
    
    process_data(payload)  # 30+ seconds
    bucket.blob(webhook.gcs_path).delete()

My main query points:

Is GCS + Cloud Tasks the right pattern for Cloud Run's model, or is storing JSON directly temporarily in a django model a better approach since Cloud Tasks handles the queueing?
Should I be using Pub/Sub instead? My understanding is that pubsub would be more appropriate for broadcasting to numerous subscribers, currently I only have the one django monolith.

Thanks for any advice that comes my way.

The combination of Cloud Tasks + storing the data in a database (like your WebhookPayload model) is a good pattern. Storing the data in GCS is unnecessary given your current implementation and adds complexity.

Cloud Tasks handles the queueing and retries [3]. It is a reliable, managed service designed for this exact purpose.
Storing data in your Django database is simpler than involving GCS for temporary storage of the payload data itself. Since you are already creating a WebhookPayload model instance, you can store the entire JSON payload within a JSONField (or similar text field) in that model [5]. This keeps all related data within your primary data store and eliminates the GCS layer.

Revised, simplified approach:

Webhook endpoint:

python

def receive_webhook(request):
    # Store data directly in the database
    webhook_payload_data = json.dumps(request.data)
    webhook = WebhookPayload.objects.create(payload=webhook_payload_data) 

    # Pass the ID to the task
    create_cloud_task(payload_id=webhook.id)

    return Response(status=200)

Processing endpoint:

python

def process_webhook(request):
    # Retrieve data from the database
    webhook = WebhookPayload.objects.get(id=request.data['payload_id'])
    payload = json.loads(webhook.payload)

    process_data(payload)  # 30+ seconds

    # Clean up the database record after successful processing
    webhook.delete()

Should I be using Pub/Sub instead?

Cloud Tasks is better suited for this specific scenario than Pub/Sub because you have a direct "worker" endpoint relationship [1].

Cloud Tasks is designed for explicit point-to-point delivery of a payload to a specific service endpoint, with built-in features like automatic retries with exponential backoff and explicit control over the target URL [3]. This aligns perfectly with your need to ensure a specific webhook is processed exactly once by your Django monolith.
Pub/Sub is designed for event ingestion and broadcast to potentially many subscribers (a fan-out pattern) [4]. While it can be used for a single worker, you would need to manage subscription logic, acknowledgements, and dead-letter queues manually within your application logic, which Cloud Tasks handles as a core service feature.

Summary

Your initial pattern is correct, but can be simplified by using your Django model's database for temporary storage instead of GCS. Cloud Tasks is the ideal Google Cloud service for this type of asynchronous, guaranteed delivery background processing with Cloud Run [1].

I would say use Cloud Tasks to decouple the webhook response from processing. It gives retries, rate limiting, and auth to Cloud Run. Also store the payload where it best fits, for example, put the JSON directly in the task body for less than 1 MB.
Cloud Tasks + DB or GCS for large blobs is the right fit. Keep URLs accurate with Tasks; and use DB for transactions, GCS for big payloads.

# Webhook (fast)
def receive_webhook(request):
    payload = request.get_json()
    rec = WebhookPayload.objects.create(
        external_id=payload.get("event_id"),  
        data=payload, # if small, else store in GCS and save path
        status="queued",
    )
    create_cloud_task(
        url=PROCESS_URL,
        body={"payload_id": rec.id},
        oidc_service_account=SA_EMAIL,
        retry={"maxAttempts": 10, "maxBackoff": "600s"},
        rate_limit=5,
    )
    return Response(status=200)

# Task handler
def process_webhook(request):
    rec = WebhookPayload.objects.get(id=request.json["payload_id"])
    if rec.status == "done":  # idempotency
        return Response(status=200)
    process_data(rec.data)    # or load from GCS if you stored a path
    rec.status = "done"
    rec.save()
    return Response(status=200)

Вернуться на верх

Последние вопросы и ответы

Best practice to switch between development and production urls when using Vue/Vite and Django

How to disable Celery startup logs?

Problem connecting djangoapi with react native mobile with django api

Filter dates using foreignkey Django and Python

Filter dates using foreignkey Django

Paystack integration with Django

My password field and confirm_password field are not being validated as expected except through my view

How do I make a custom field output different datatypes depending on file asking for data?

API request tracing with NextJS, Nginx and Django

get the data from the server with fetch API

Is it reasonable to use Cloud storage for async webhook processing on Cloud Run

Последние вопросы и ответы

Рекомендуемые записи по теме