2

I stuck at one issue related group by when the time interval is more than 1 day. It's giving the wrong start time for different grain when grain is more than 1 day.


  • Grain = 1day
  • ExpectedStartTime=2019-01-01
  • ActualStartTime=2019-01-01
>  select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(1d), "host" LIMIT 2;
time                 mean_messages
----                 ----
2019-01-01T00:00:00Z 181232
2019-01-02T00:00:00Z 179728
  • Grain = 2day
  • ExpectedStartTime=2019-01-01
  • ActualStartTime=2018-12-31
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(2d), "host" LIMIT 2;
time                 mean_messages
----                 ----
2018-12-31T00:00:00Z 181232
2019-01-02T00:00:00Z 347824
  • Grain = 5day
  • ExpectedStartTime=2019-01-01
  • ActualStartTime=2018-12-30
> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(5d), "host" LIMIT 2;
time                 mean_messages
----                 ----
2018-12-30T00:00:00Z 529056
2019-01-04T00:00:00Z 826694.3999999762

I read in the documentation that Influx uses present time boundary, but doesn't say how present time boundary is calculated. Is it a start of a month or a start of a week or time of first data received or time of the shard starting?

If I know how this present time boundary is being calculated, I can specify offset in groupby to keep the first slot starting from 2019-01-01.

1 Answer 1

2

InfluxDB uses epoch time to calculate the present time boundaries. It creates groupby slots with reference to epoch time.

To keep start time same in groupby I need to pass an offset.

Here is a simple offset calculation function written in python which takes start time and groupby interval.

def get_offset(start_dt, interval_m):
    epoch = datetime.datetime.utcfromtimestamp(0)
    offset = (start_time - epoch).total_seconds() % (interval_m * 60)
    return offset

start_dt = datetime.datetime(2019,1,1,0,0)
interval_m = 1440 * 3 # 3 days 
offset_s = get_offset(start_dt, interval_m) # 172800

Groupby interval with 3 days, the query will look like below with offset.

> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(3d, 172800s), "host" LIMIT 2;
time                      mean_messages
----                      ----
2019-01-01T00:00:00+05:30 539232
2019-01-04T00:00:00+05:30 464640

https://github.com/influxdata/influxdb/issues/8010

3
  • Very clever. Any idea how to make this in php? Commented Dec 21, 2019 at 1:19
  • you just need to write get_offset function in php to generate offset for influx query Commented Dec 31, 2019 at 8:58
  • Thanks for the clear answer - very helpful! You do have a small typo in the get_offset() function, where in the offset = line the variable reference should be to start_dt rather than start_time. I tried to submit an edit suggestion, but was rejected since apparently "Edits must be at least 6 characters"...hah
    – Svet
    Commented Jun 7, 2021 at 9:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.