InfluxDB: Wrong start time when group by more than 1 day

Question

I stuck at one issue related group by when the time interval is more than 1 day. It's giving the wrong start time for different grain when grain is more than 1 day.

Grain = 1day
ExpectedStartTime=2019-01-01
ActualStartTime=2019-01-01

>  select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(1d), "host" LIMIT 2;
time                 mean_messages
----                 ----
2019-01-01T00:00:00Z 181232
2019-01-02T00:00:00Z 179728

Grain = 2day
ExpectedStartTime=2019-01-01
ActualStartTime=2018-12-31

> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(2d), "host" LIMIT 2;
time                 mean_messages
----                 ----
2018-12-31T00:00:00Z 181232
2019-01-02T00:00:00Z 347824

Grain = 5day
ExpectedStartTime=2019-01-01
ActualStartTime=2018-12-30

> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(5d), "host" LIMIT 2;
time                 mean_messages
----                 ----
2018-12-30T00:00:00Z 529056
2019-01-04T00:00:00Z 826694.3999999762

I read in the documentation that Influx uses present time boundary, but doesn't say how present time boundary is calculated. Is it a start of a month or a start of a week or time of first data received or time of the shard starting?

If I know how this present time boundary is being calculated, I can specify offset in groupby to keep the first slot starting from 2019-01-01.

Hardik Sondagar · Accepted Answer · 2019-02-22 05:15:55Z

2

InfluxDB uses epoch time to calculate the present time boundaries. It creates groupby slots with reference to epoch time.

To keep start time same in groupby I need to pass an offset.

Here is a simple offset calculation function written in python which takes start time and groupby interval.

def get_offset(start_dt, interval_m):
    epoch = datetime.datetime.utcfromtimestamp(0)
    offset = (start_time - epoch).total_seconds() % (interval_m * 60)
    return offset

start_dt = datetime.datetime(2019,1,1,0,0)
interval_m = 1440 * 3 # 3 days 
offset_s = get_offset(start_dt, interval_m) # 172800

Groupby interval with 3 days, the query will look like below with offset.

> select mean("messages") from rabbitmq where host='rabbitmq_cluster' and time>='2019-01-01 00:00:00' and time<'2019-01-16 00:00:00' GROUP BY time(3d, 172800s), "host" LIMIT 2;
time                      mean_messages
----                      ----
2019-01-01T00:00:00+05:30 539232
2019-01-04T00:00:00+05:30 464640

https://github.com/influxdata/influxdb/issues/8010

edited Feb 22, 2019 at 5:15

answered Feb 21, 2019 at 12:40

Hardik Sondagar

4,4653 gold badges30 silver badges49 bronze badges

Very clever. Any idea how to make this in php?
– Milen Mihalev
Commented Dec 21, 2019 at 1:19
you just need to write get_offset function in php to generate offset for influx query
– Hardik Sondagar
Commented Dec 31, 2019 at 8:58
Thanks for the clear answer - very helpful! You do have a small typo in the get_offset() function, where in the offset = line the variable reference should be to start_dt rather than start_time. I tried to submit an edit suggestion, but was rejected since apparently "Edits must be at least 6 characters"...hah
– Svet
Commented Jun 7, 2021 at 9:19

Add a comment |

Collectives™ on Stack Overflow

InfluxDB: Wrong start time when group by more than 1 day

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
influxdb
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged influxdb or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
influxdb
or ask your own question.