InfluxDB: High cardinality for specific shards

Ask Question

Asked 4 years, 9 months ago

Modified 4 years, 9 months ago

Viewed 499 times

I'm querying data from different shards and used EXPLAIN to check how many series are being fetched for that particular date range.

> SHOW SHARDS
.
.
658 mydb autogen          658         2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen          676         2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.

Executing EXPLAIN for data from shard 658 and it's giving expected result in terms of number of series. SensorId is only tag key and as date range fall into only shard it's giving NUMBER OF SERIES: 1

> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482

But when I run the same query on date range that falls into shard 676, number of series is 13140 instead of just one.

> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-29 00:00:00' AND time < '2019-07-30 00:00:00';
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 13140
CACHED VALUES: 0
NUMBER OF FILES: 11426
NUMBER OF BLOCKS: 23561
SIZE OF BLOCKS: 108031642

Environment info:

System info: Linux 4.4.0-1087-aws x86_64
InfluxDB version: InfluxDB v1.7.6 (git: 1.7 01c8dd4)

Update - 1

On checking field cardinality, I observed a spike in RAM.

> SHOW FIELD KEY CARDINALITY

Update - 2

I've rebuilt the indexes, but the cardinality is still high.

Update - 3

I found out that shard has "SensorId" as tag as well as field that causing high cardinality when querying with the "SensorId" filter.

> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time                 count
----                 -----
1970-01-01T00:00:00Z 40

But when I'm checking tag values with key 'SensorId', it's not showing empty string that present in the above query.

> show tag values with key = "SensorId"
name: Reading
key      value
---      -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949

Update - 4

Inspected data using influx_inspect dumptsm and re-validated that null tag values are present

$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm

Index:

  Pos   Min Time                Max Time                Ofs     Size    Key                     Field
  1     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    5       103     Reading                 1001
  2     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    108     275     Reading                 2001
  3     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    383     248     Reading                 2002
  4     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    631     278     Reading                 2003
  5     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    909     278     Reading                 2004
  6     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1187    184     Reading                 2005
  7     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1371    103     Reading                 2006
  8     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1474    250     Reading                 2007
  9     2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1724    103     Reading                 2008
  10    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    1827    275     Reading                 2012
  11    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2102    416     Reading                 2101
  12    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2518    103     Reading                 2692
  13    2019-08-01T01:46:31Z    2019-08-01T17:42:03Z    2621    101     Reading                 SensorId
  14    2019-07-29T00:00:05Z    2019-07-29T05:31:07Z    2722    1569    Reading,SensorId=10034  2005
  15    2019-07-29T05:31:26Z    2019-07-29T11:03:54Z    4291    1467    Reading,SensorId=10034  2005
  16    2019-07-29T11:04:14Z    2019-07-29T17:10:16Z    5758    1785    Reading,SensorId=10034  2005

edited Oct 14, 2019 at 12:55

asked Oct 9, 2019 at 6:52

Hardik Sondagar

4,4653 gold badges30 silver badges49 bronze badges

If you need reducing RAM usage for high number of time series, then take a look at another time series databases such as TimescaleDB or VictoriaMetrics. See this article, which compares RAM usage for InfluxDB and VictoriaMetrics for various cardinality levels
– valyala
Commented Oct 12, 2019 at 20:48
1

It's not about high number of time-series, it's a bug that showing two different series-cardinality in different shards (having equal distribution of data).
– Hardik Sondagar
Commented Oct 13, 2019 at 4:21

Add a comment |

Collectives™ on Stack Overflow

InfluxDB: High cardinality for specific shards

0

Browse other questions tagged
influxdb
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged influxdb or ask your own question.

Browse other questions tagged
influxdb
or ask your own question.