Is it reasonable to use Cloud storage for async webhook processing on Cloud Run

I'm processing webhooks on Cloud Run (Django) that need async handling because processing takes 30+ seconds but the webhook provider times out at 30s.

Since Cloud Run is stateless and spins up per-request (no persistent background workers like Celery), I'm using this pattern:

# 1. Webhook endpoint
def receive_webhook(request):
    blob_name = f"webhooks/{uuid.uuid4()}.json"
    bucket.blob(blob_name).upload_from_string(json.dumps(request.data))
    
    webhook = WebhookPayload.objects.create(gcs_path=blob_name)
    create_cloud_task(payload_id=webhook.id)
    
    return Response(status=200)  # Fast response

And then our cloud task calls the following endpoint with the unique path to the cloud storage url passed from the original webhook endpoint:

def process_webhook(request):
    webhook = WebhookPayload.objects.get(id=request.data['payload_id'])
    payload = json.loads(bucket.blob(webhook.gcs_path).download_as_text())
    
    process_data(payload)  # 30+ seconds
    bucket.blob(webhook.gcs_path).delete()

My main query points:

  1. Is GCS + Cloud Tasks the right pattern for Cloud Run's model, or is storing JSON directly temporarily in a django model a better approach since Cloud Tasks handles the queueing?

  2. Should I be using Pub/Sub instead? My understanding is that pubsub would be more appropriate for broadcasting to numerous subscribers, currently I only have the one django monolith.

Thanks for any advice that comes my way.

The combination of Cloud Tasks + storing the data in a database (like your WebhookPayload model) is a good pattern. Storing the data in GCS is unnecessary given your current implementation and adds complexity.

  • Cloud Tasks handles the queueing and retries [3]. It is a reliable, managed service designed for this exact purpose.

  • Storing data in your Django database is simpler than involving GCS for temporary storage of the payload data itself. Since you are already creating a WebhookPayload model instance, you can store the entire JSON payload within a JSONField (or similar text field) in that model [5]. This keeps all related data within your primary data store and eliminates the GCS layer.

Revised, simplified approach:

  1. Webhook endpoint:

    python

    def receive_webhook(request):
        # Store data directly in the database
        webhook_payload_data = json.dumps(request.data)
        webhook = WebhookPayload.objects.create(payload=webhook_payload_data) 
    
        # Pass the ID to the task
        create_cloud_task(payload_id=webhook.id)
    
        return Response(status=200) 
    
    
  2. Processing endpoint:

    python

    def process_webhook(request):
        # Retrieve data from the database
        webhook = WebhookPayload.objects.get(id=request.data['payload_id'])
        payload = json.loads(webhook.payload)
    
        process_data(payload)  # 30+ seconds
    
        # Clean up the database record after successful processing
        webhook.delete() 
    
    

Should I be using Pub/Sub instead?

Cloud Tasks is better suited for this specific scenario than Pub/Sub because you have a direct "worker" endpoint relationship [1].

  • Cloud Tasks is designed for explicit point-to-point delivery of a payload to a specific service endpoint, with built-in features like automatic retries with exponential backoff and explicit control over the target URL [3]. This aligns perfectly with your need to ensure a specific webhook is processed exactly once by your Django monolith.

  • Pub/Sub is designed for event ingestion and broadcast to potentially many subscribers (a fan-out pattern) [4]. While it can be used for a single worker, you would need to manage subscription logic, acknowledgements, and dead-letter queues manually within your application logic, which Cloud Tasks handles as a core service feature.

Summary

Your initial pattern is correct, but can be simplified by using your Django model's database for temporary storage instead of GCS. Cloud Tasks is the ideal Google Cloud service for this type of asynchronous, guaranteed delivery background processing with Cloud Run [1].

Вернуться на верх