1

I'm trying to write aggregated result from two measurements into a single measurement.

I found on documentation that you can write multiple matching measurements with :MEASUREMENT keyword in INTO query. Like

SELECT * INTO "copy_NOAA_water_database"."autogen".:MEASUREMENT FROM 
"NOAA_water_database"."autogen"./.*/

What I'm trying to do is aggregate from multiple measurements and write result to a single measurement.

SELECT mean("water_level") INTO 
"copy_NOAA_water_database"."autogen"."water_agg" FROM 
"NOAA_water_database"."autogen"./.*/ GROUP BY time(15m), *

The above query runs successfully, but I'm not sure whether influx has considered points from all measurement of NOAA_water_database or just last appearing measurement is considered.

1 Answer 1

1

Q: I'm not sure whether influx has considered points from all measurement of NOAA_water_database or just last appearing measurement is considered.

A: I suspect influxdb is not aggregating the data from your measurements.

I think it is only aggregating the data from each measurement individually and then for each output write it to your specified measurement and since the resolved time of the mean operation can possibly be the same, measurement B's result can overwrite measurement A's result.

I derived this theory by doing an experiment using the following dataset;

INSERT cpu,host=serverA value=10
INSERT cpu,host=serverA value=20

INSERT cpu2,host=serverA value=5
INSERT cpu2,host=serverA value=15

Doing a SELECT statement similar to your query above returns;

select * FROM "historian"."autogen"./cpu.*/
name: cpu
time                host    value
----                ----    -----
1546511130857357196 serverA 10
1546511132744883738 serverA 20

name: cpu2
time                host    value
----                ----    -----
1546511156629403118 serverA 5
1546511157888695746 serverA 15

Then instead of using mean I do sum to find test the behaviour of influxdb. I also simplified the query by dropping the groupBy operation.

Doing a sum gives me;

SELECT sum("value") INTO test_sum FROM "historian"."autogen"./.*/ 
name: result
time written
---- -------
0    2
> select * from test_sum;
name: test_sum
time sum
---- ---
0    20

Theory: if influx is aggregating the data from all measurements, the sum result would not be 20. It should be 50. The only way 20 can be derived is from by summing 5 + 15 which is the data from the last measurement.

But when we do the sum operation, influx did told us 2 rows were written. My theory to this is that, the influx did calculate the sum of the first measurement however as first and second summation's result time is both 0 therefore the 2nd measurement's result would have overwritten the first result's.

Recommended solution: The best tool to do this job is actually influxdb's kapacitor. It is a great tool because it is fast however it is also extremely to learn.

Alternatively if your dataset isn't huge which I suspect it should be alright since you are grouping by 15m. You can write a script in your favourite programming language to read out the data, do the mean and then write the data back to influxdb.

Not the answer you're looking for? Browse other questions tagged or ask your own question.