Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus remote write: timestamp too old #3763

Open
JasmineCA opened this issue Jun 3, 2024 · 5 comments
Open

Prometheus remote write: timestamp too old #3763

JasmineCA opened this issue Jun 3, 2024 · 5 comments
Assignees
Labels
awaiting user waiting for user to respond

Comments

@JasmineCA
Copy link

Brief summary

Hello folks of k6,

I've been using k6 for some months and it's a great tool, even if we still lack useful features (but I'm sure they will come soon). We are using prometheus-rw feature to push metrics to a Grafana Mimir self-hosted instance. From time to time (actually very often), k6 tests returns this error:
ERRO[0007] Failed to send the time series data to the endpoint error="got status code: 400 instead expected a 2xx successful status code" output="Prometheus remote write".
When we check on the Mimir side, we can see logs like this:
ts=2024-06-03T12:05:33.265292141Z caller=push.go:130 level=error user=***** msg="push error" err="rpc error: code = Code(400) desc = failed pushing to ingester: user=******: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 1970-01-01T00:00:00Z and is from series {__name__=\"k6_http_req_connecting_seconds\", expected_response=\"true\", method=\"POST\", name=\"*******", proto=\"HTTP/1.1\", scenario=\"create_merchant_with_store\", status=\"201\", test_suite_id=\"create-merchant-then-store-loadtest-it-2024-06-03-11:49:03\", testid=\"create_merchant_with_store-2024-06-03-11:49:07\", url=\"******}" (sanitized log)

It seems that the timestamp is not set when the metric is sent? You can see with the test_suite_id attribute that the test has been run this morning, so the timestamp Mimir got is indeed incorrect.

Do you have any input to avoid these kind of bugs?

Regards,

k6 version

0.50.0

OS

Windows under WSL

Docker version and image (if applicable)

No response

Steps to reproduce the problem

  • Deploys a self hosted Mimir (or maybe skip this part if it works with any prometheus server)
  • Runs a test with the experimental-prometheus-rw flag
  • Waits until you got unlucky and have errors pushing metrics

Expected behaviour

Metrics are sent 100% of the time

Actual behaviour

Metrics are not pushed randomly because of a timestamp error

@Rbillon59
Copy link

Hello,

I faced the same issue, it was because I had multiple k6 instance running in parallel and the generated prometheus metrics had the same label (same cardinalities).

The solution I found was to add a tag scenario (could be any tag, it just need to be unique among each k6 instance running), so the generated prometheus metric can.

Like :

k6 run --log-format json --no-summary --quiet "my-scenario.js" -o experimental-prometheus-rw --tag scenario=my-scenario

@JasmineCA
Copy link
Author

Hello,

Thank you for your comment. It might become useful for us once we will upgrade to multiple k6 instances running the same scenario. Unfortunately, I face this issue with only one k6 instance running at the same time. Even if they were multiple instances, I have a tag built from a timestamp so the cardinality would be different. But it's a strange issue you faced, because I though prometheus metrics would be a way to reunite metrics of separate k6 instances running the same scenario.

@codebien
Copy link
Collaborator

codebien commented Jun 7, 2024

Hey @JasmineCA,
are you running k6 in a configuration with multiple instances as @Rbillon59 mentioned? For example, running k6-operator?

@codebien codebien added awaiting user waiting for user to respond and removed bug triage labels Jun 7, 2024
@codebien
Copy link
Collaborator

codebien commented Jun 7, 2024

prometheus metrics would be a way to reunite metrics of separate k6 instances running the same scenario.

Yes, but it generally does by applying an instance label and this is what we suggest also for k6 with multiple instances configuration.

The affected sample has timestamp 1970-01-01T00:00:00Z

Btw, I see only now that the wrong timestamp is 1970-... which means time zero for Go. There is a high chance that this is a bug on our side.

Hopefully, I will find the time to fix it at the begin of the next week.

@JasmineCA
Copy link
Author

Hello @codebien ,

Did you have the time to check where in the code this bug could be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting user waiting for user to respond
3 participants