11

We are trying to calculate the storage requirements but is unable to find the values needed to do the calculation for our version of Prometheus (v2.2).

Prometheus(v2.2) storage documentation gives this simple formula:

needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample
  • retention_time_seconds: We took our retention time of 720 hours and converted to 2 592 000seconds.
  • bytes_per_sample: We used rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[1d]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[1d]) based on the formula from this site, except we did not have prometheus_tsdb_compaction_chunk_size_bytes_sum so used prometheus_tsdb_compaction_chunk_size_sum instead. This gives us ~1.3, which sounds about right.
  • ingested_samples_per_second: We cannot find a way to get this value. We tried the following:

    • This site suggest prometheus_local_storage_chunk_ops_total, but we do not have this metric.

      Another useful metric to query and visualize is the prometheus_local_storage_chunk_ops_total metric that reports the per-second rate of all storage chunk operations taking place in Prometheus.

    • This uses rate(prometheus_tsdb_head_samples_appended_total[1h]), but the result is ~18600 suggesting 68TiB storage.

      Combined with rate(prometheus_tsdb_head_samples_appended_total[1h]) for the samples ingested per second, you should have a good idea of how much disk space you need given your retention window.

    • As above, this issue mentions rate(tsdb_samples_appended_total[5m]), but probably meant (or what we got working is) rate(prometheus_tsdb_head_samples_appended_total[5m])

      You can still get samples per second via rate(tsdb_samples_appended_total[5m]).

    • This uses prometheus_local_storage_ingested_samples_total, but again, it is not available to us.

      sample rate = rate(prometheus_local_storage_ingested_samples_total{job="prometheus",instance="$Prometheus:9090"}[1m])

    • Finally this issue also says to use rate(prometheus_local_storage_ingested_samples_total[2m]), and once again, it is not available to us.

      The other side is ingestion, which is way easier to reason with capacity-wise. You can find out how many samples your server is ingesting with this query: rate(prometheus_local_storage_ingested_samples_total[2m])

Any assitance or nudge in the right direction would be much appreciated.

>>> What worked in the end <<<

As was pointed out, our math was broken, so the below metrics worked for the calculation.

  • ingestion_samples_per_second: rate(prometheus_tsdb_head_samples_appended_total[2h])
  • bytes_per_sample: rate(prometheus_tsdb_compaction_chunk_size_sum[2h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[2h])
1
  • 2
    For those of you using prometheus v2.20, the bytes per sample expression is slightly different: rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[2h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[2h])
    – Javier PR
    Commented Oct 7, 2020 at 7:06

2 Answers 2

4

I think your math may be off. If you use the ~18600 number you found I get a very different result from 68TiB:

2,592,000 (seconds) * 18600 (samples/second) * 1.3 (bytes/sample) = 62,674,560,000 (bytes). This is ~62.675 Gigabytes (divide the digital storage value by 1e+9 ) and I'm guessing a more reasonable number for your infrastructure.

2
  • 1
    Absolutely correct - my math was off. I thought I was going mad, just ended up being stupid. Thank you
    – MCoetzee
    Commented Sep 27, 2019 at 17:54
  • Adding a separate answer that directly answers the question in the title, in case other folks are coming from Google and aren't concerned with the particular math errors that OP made
    – Brandon
    Commented Jan 8, 2021 at 19:53
8

To calculate disk space required by Prometheus v2.20 in bytes, use the query:

retention_time_seconds *
rate(prometheus_tsdb_head_samples_appended_total[2h]) *
(rate(prometheus_tsdb_compaction_chunk_size_bytes_sum[2h]) / rate(prometheus_tsdb_compaction_chunk_samples_sum[2h]))

Where retention_time_seconds is the value you've configured for --storage.tsdb.retention.time, which defaults to 15d = 1296000 seconds.

1
  • do you know we can exclude the space prometheus target itself would take? I have a target that samples every 1 minute. I think the storage required for that should be reasonably small, but when you add in prometheus target that is sampling every 5 seconds, the space required adds up. So I was wondering if there is a way to exclude prometheus target from space calculation ?
    – kaptan
    Commented Jun 27, 2022 at 23:13

Not the answer you're looking for? Browse other questions tagged or ask your own question.