How to aggregate a group by queryset in django?

I'm working with time series data which are represented using this model:

class Price:
    timestamp = models.IntegerField()
    price = models.FloatField()

Assuming timestamp has 1 min interval data, this is how I would resample it to 1 hr:

queryset = (
    Price.objects.annotate(timestamp_agg=Floor(F('timestamp') / 3600))
    .values('timestamp_agg')
    .annotate(
        timestamp=Min('timestamp'),
        high=Max('price'),
    )
    .values('timestamp', 'high')
    .order_by('timestamp')
)

which runs the following sql under the hood:

select min(timestamp) timestamp, max(price) high
from core_price
group by floor((timestamp / 3600))
order by timestamp

Now I want to calculate a 4 hr moving average, usually calculated in the following way:

select *, avg(high) over (order by timestamp rows between 4 preceding and current row) ma
from (select min(timestamp) timestamp, max(price) high
      from core_price
      group by floor((timestamp / 3600))
      order by timestamp)

or

Window(expression=Avg('price'), frame=RowRange(start=-4, end=0))

How to apply the window aggregation above to the first query? Obviously I can't do something like this since the first query is already an aggregation:

>>> queryset.annotate(ma=Window(expression=Avg('high'), frame=RowRange(start=-4, end=0)))
django.core.exceptions.FieldError: Cannot compute Avg('high'): 'high' is an aggregate

AFAIK Django does not support such query natively. Your best shot is to execute raw SQL. You could also use `django-cte`.

The Raw SQL solution would look like something like this:

# Build the hourly bars SQL with the ORM (so it stays DB-agnostic)
hourly_qs = (
    Price.objects.annotate(timestamp_agg=Floor(F('timestamp') / 3600))
    .values('timestamp_agg')
    .annotate(
        timestamp=Min('timestamp'),
        high=Max('price'),
    )
    .values('timestamp', 'high')
    .order_by('timestamp')
)

# Build Raw SQL for the Moving Average

sql = f"""
WITH hourly AS (
    {hourly_qs.query}
)
SELECT
    ts AS timestamp,
    high,
    AVG(high) OVER (
        ORDER BY ts
        ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
    ) AS ma
FROM hourly
ORDER BY ts
"""

with connection.cursor() as cur:
    cur.execute(sql)
    rows = cur.fetchall()  # [(timestamp, high, ma), ...]
Вернуться на верх