Choosing a storage schema for second-by-second telemetry data (PostgreSQL/Django/Grafana) [closed]
My application receives telemetry packets every second from equipment to an operations center. Each packet is identified by a fixed SID, which defines the data structure, including:
- SID 1: temperature of equipment 1, 2, 3, 4
- SID 2: voltage of batteries 1, 2, 3
Solution Generic EAV is already implemented and running in production, but it generates a very large data volume.
Rewriting the storage layer would be very costly in both development and maintenance.
In Django I can display the latest value of each parameter with nanosecond precision without issues.
However, in Grafana, any attempt to build a historical dashboard (e.g., one month of temperature data) consumes massive resources (CPU/memory) to execute queries.
Objectives:
- Keep real-time display of the latest values in Django
- Enable efficient plotting of time series over several months in Grafana without resource exhaustion
Solution 1 Generic EAV, in production:
A single Django model stores each parameter in an
Entity–Attribute–Value
table:# models.py class ParameterHK(models.Model): id = models.BigAutoField(primary_key=True, db_index=True) date_hk_reception = models.DateTimeField(auto_now_add=True, db_index=True) sid = models.IntegerField(db_index=True) equipment = models.CharField(max_length=500) label_parameter = models.CharField(max_length=500, db_index=True) unit = models.CharField(max_length=500) value = models.IntegerField() date_hk_obc = models.IntegerField()
One row per parameter per second.
Works fine for Django queries but causes:
- a huge number of rows
- very large indexes
- resource-intensive aggregation queries in Grafana
Solution 2 Wide tables per SID:
Create a dedicated table for each SID, with one column per parameter. Example for SID 1:
CREATE TABLE hk_sid1 ( ts TIMESTAMPTZ NOT NULL, equip1_temp INTEGER, equip2_temp INTEGER, equip3_temp INTEGER, equip4_temp INTEGER );
Direct reading of each time series column without complex filtering.
Faster Grafana read performance.
Drawbacks: rigid schema, requires migrations on protocol changes, and multiple tables to manage.
What is a design for a second-by-second data stream stored over months, while keeping Solution 1 in production due to the high cost of a full rewrite?
Solution 1 has no rewrite, but is limited in historical query performance.
Solution 2 has faster reads, but is costly in development and maintenance.
What alternatives (partitioning, continuous aggregates, downsampling, automatic retention policies, etc.) limit resource usage while maintaining flexibility?