Prometheus remote write: timestamp too old #3763

JasmineCA · 2024-06-03T12:58:15Z

Brief summary

Hello folks of k6,

I've been using k6 for some months and it's a great tool, even if we still lack useful features (but I'm sure they will come soon). We are using prometheus-rw feature to push metrics to a Grafana Mimir self-hosted instance. From time to time (actually very often), k6 tests returns this error:
ERRO[0007] Failed to send the time series data to the endpoint error="got status code: 400 instead expected a 2xx successful status code" output="Prometheus remote write".
When we check on the Mimir side, we can see logs like this:
ts=2024-06-03T12:05:33.265292141Z caller=push.go:130 level=error user=***** msg="push error" err="rpc error: code = Code(400) desc = failed pushing to ingester: user=******: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 1970-01-01T00:00:00Z and is from series {__name__=\"k6_http_req_connecting_seconds\", expected_response=\"true\", method=\"POST\", name=\"*******", proto=\"HTTP/1.1\", scenario=\"create_merchant_with_store\", status=\"201\", test_suite_id=\"create-merchant-then-store-loadtest-it-2024-06-03-11:49:03\", testid=\"create_merchant_with_store-2024-06-03-11:49:07\", url=\"******}" (sanitized log)

It seems that the timestamp is not set when the metric is sent? You can see with the test_suite_id attribute that the test has been run this morning, so the timestamp Mimir got is indeed incorrect.

Do you have any input to avoid these kind of bugs?

Regards,

k6 version

0.50.0

OS

Windows under WSL

Docker version and image (if applicable)

No response

Steps to reproduce the problem

Deploys a self hosted Mimir (or maybe skip this part if it works with any prometheus server)
Runs a test with the experimental-prometheus-rw flag
Waits until you got unlucky and have errors pushing metrics

Expected behaviour

Metrics are sent 100% of the time

Actual behaviour

Metrics are not pushed randomly because of a timestamp error

The text was updated successfully, but these errors were encountered:

Rbillon59 · 2024-06-07T08:17:14Z

Hello,

I faced the same issue, it was because I had multiple k6 instance running in parallel and the generated prometheus metrics had the same label (same cardinalities).

The solution I found was to add a tag scenario (could be any tag, it just need to be unique among each k6 instance running), so the generated prometheus metric can.

Like :

k6 run --log-format json --no-summary --quiet "my-scenario.js" -o experimental-prometheus-rw --tag scenario=my-scenario

JasmineCA · 2024-06-07T17:06:02Z

Hello,

Thank you for your comment. It might become useful for us once we will upgrade to multiple k6 instances running the same scenario. Unfortunately, I face this issue with only one k6 instance running at the same time. Even if they were multiple instances, I have a tag built from a timestamp so the cardinality would be different. But it's a strange issue you faced, because I though prometheus metrics would be a way to reunite metrics of separate k6 instances running the same scenario.

codebien · 2024-06-07T17:11:51Z

Hey @JasmineCA,
are you running k6 in a configuration with multiple instances as @Rbillon59 mentioned? For example, running k6-operator?

codebien · 2024-06-07T18:04:54Z

prometheus metrics would be a way to reunite metrics of separate k6 instances running the same scenario.

Yes, but it generally does by applying an instance label and this is what we suggest also for k6 with multiple instances configuration.

The affected sample has timestamp 1970-01-01T00:00:00Z

Btw, I see only now that the wrong timestamp is 1970-... which means time zero for Go. There is a high chance that this is a bug on our side.

Hopefully, I will find the time to fix it at the begin of the next week.

JasmineCA · 2024-06-17T18:03:27Z

Hello @codebien ,

Did you have the time to check where in the code this bug could be?

JasmineCA added the bug label Jun 3, 2024

github-actions bot added the triage label Jun 3, 2024

github-actions bot assigned codebien Jun 3, 2024

codebien added awaiting user waiting for user to respond and removed bug triage labels Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus remote write: timestamp too old #3763

Prometheus remote write: timestamp too old #3763

JasmineCA commented Jun 3, 2024

Rbillon59 commented Jun 7, 2024

JasmineCA commented Jun 7, 2024

codebien commented Jun 7, 2024

codebien commented Jun 7, 2024 •

edited

Loading

JasmineCA commented Jun 17, 2024

Prometheus remote write: timestamp too old #3763

Prometheus remote write: timestamp too old #3763

Comments

JasmineCA commented Jun 3, 2024

Brief summary

k6 version

OS

Docker version and image (if applicable)

Steps to reproduce the problem

Expected behaviour

Actual behaviour

Rbillon59 commented Jun 7, 2024

JasmineCA commented Jun 7, 2024

codebien commented Jun 7, 2024

codebien commented Jun 7, 2024 • edited Loading

JasmineCA commented Jun 17, 2024

codebien commented Jun 7, 2024 •

edited

Loading