Choosing a storage schema for second-by-second telemetry data (PostgreSQL/Django/Grafana) [closed]

My application receives telemetry packets every second from equipment to an operations center. Each packet is identified by a fixed SID, which defines the data structure, including:

  • SID 1: temperature of equipment 1, 2, 3, 4
  • SID 2: voltage of batteries 1, 2, 3

Solution Generic EAV is already implemented and running in production, but it generates a very large data volume.

Rewriting the storage layer would be very costly in both development and maintenance.

In Django I can display the latest value of each parameter with nanosecond precision without issues.

However, in Grafana, any attempt to build a historical dashboard (e.g., one month of temperature data) consumes massive resources (CPU/memory) to execute queries.

Objectives:

  1. Keep real-time display of the latest values in Django
  2. Enable efficient plotting of time series over several months in Grafana without resource exhaustion

Solution 1 Generic EAV, in production:

  • A single Django model stores each parameter in an Entity–Attribute–Value table:

    # models.py
    class ParameterHK(models.Model):
        id                = models.BigAutoField(primary_key=True, db_index=True)
        date_hk_reception = models.DateTimeField(auto_now_add=True, db_index=True)
        sid               = models.IntegerField(db_index=True)
        equipment         = models.CharField(max_length=500)
        label_parameter   = models.CharField(max_length=500, db_index=True)
        unit              = models.CharField(max_length=500)
        value             = models.IntegerField()
        date_hk_obc       = models.IntegerField()
    
  • One row per parameter per second.

  • Works fine for Django queries but causes:

    • a huge number of rows
    • very large indexes
    • resource-intensive aggregation queries in Grafana

Solution 2 Wide tables per SID:

  • Create a dedicated table for each SID, with one column per parameter. Example for SID 1:

    CREATE TABLE hk_sid1 (
      ts            TIMESTAMPTZ NOT NULL,
      equip1_temp   INTEGER,
      equip2_temp   INTEGER,
      equip3_temp   INTEGER,
      equip4_temp   INTEGER
    );
    
  • Direct reading of each time series column without complex filtering.

  • Faster Grafana read performance.

  • Drawbacks: rigid schema, requires migrations on protocol changes, and multiple tables to manage.

What is a design for a second-by-second data stream stored over months, while keeping Solution 1 in production due to the high cost of a full rewrite?

Solution 1 has no rewrite, but is limited in historical query performance.

Solution 2 has faster reads, but is costly in development and maintenance.

What alternatives (partitioning, continuous aggregates, downsampling, automatic retention policies, etc.) limit resource usage while maintaining flexibility?

Вернуться на верх