We are trying to calculate the storage requirements but is unable to find the values needed to do the calculation for our version of Prometheus (v2.2).
Prometheus(v2.2) storage documentation gives this simple formula:
needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
- retention_time_seconds: We took our retention time of 720 hours and converted to
2 592 000
seconds. - bytes_per_sample: We used
rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[1d]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[1d])
based on the formula from this site, except we did not haveprometheus_tsdb_compaction_chunk_size_bytes_sum
so usedprometheus_tsdb_compaction_chunk_size_sum
instead. This gives us ~1.3
, which sounds about right. ingested_samples_per_second: We cannot find a way to get this value. We tried the following:
This site suggest
prometheus_local_storage_chunk_ops_total
, but we do not have this metric.Another useful metric to query and visualize is the prometheus_local_storage_chunk_ops_total metric that reports the per-second rate of all storage chunk operations taking place in Prometheus.
This uses
rate(prometheus_tsdb_head_samples_appended_total[1h])
, but the result is ~18600
suggesting68TiB
storage.Combined with rate(prometheus_tsdb_head_samples_appended_total[1h]) for the samples ingested per second, you should have a good idea of how much disk space you need given your retention window.
- As above, this issue mentions
rate(tsdb_samples_appended_total[5m])
, but probably meant (or what we got working is)rate(prometheus_tsdb_head_samples_appended_total[5m])
You can still get samples per second via rate(tsdb_samples_appended_total[5m]).
- This uses
prometheus_local_storage_ingested_samples_total
, but again, it is not available to us.sample rate = rate(prometheus_local_storage_ingested_samples_total{job="prometheus",instance="$Prometheus:9090"}[1m])
- Finally this issue also says to use
rate(prometheus_local_storage_ingested_samples_total[2m])
, and once again, it is not available to us.The other side is ingestion, which is way easier to reason with capacity-wise. You can find out how many samples your server is ingesting with this query: rate(prometheus_local_storage_ingested_samples_total[2m])
Any assitance or nudge in the right direction would be much appreciated.
>>> What worked in the end <<<
As was pointed out, our math was broken, so the below metrics worked for the calculation.
- ingestion_samples_per_second:
rate(prometheus_tsdb_head_samples_appended_total[2h])
- bytes_per_sample:
rate(prometheus_tsdb_compaction_chunk_size_sum[2h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[2h])