04/04/2023 | News release | Distributed by Public on 04/04/2023 06:22
In our previous posts about choosing a Prometheus remote storage we have seen how to set up a benchmarking infrastructure and how to benchmark promql performance. We have been able to obtain results but the whole benchmark is flawned in many ways:
This blog post discuss how we should have benchmark our remote storage.
A good benchmark need to be accurate and reproducible. More over for our usecase we want to have a tool who takes into account both Prometheus's read and write path. Finally, we need to be able to remove all unnecessary pieces. This way we are able to focus on the remote storage only.
Such software could be a project on its own but fortunately for us there is one opensource solution for that: K6
K6 is a general purpose modern load testing which can be extended with module to support Prometheus remote storage. Sounds interesting don't you think?
In our previous blog post we have explained how we have built our benchmarking infrastructure which was rather complex to be accurate.
With k6 as benchmarking tool the infrastructure can be greatly simplified:
K6 is quite flexible and configurable. Its input is a load testing script, you can either write your own script or reuse an opensourced one. As the whole logic is in the load testing script it become easily reproducible which is exactly what we need.
To launch a benchmark you need two piece of infrastructure:
For our use case we have chosen to reuse the load testing from Grafana which does exactly what we are looking for. All useful information to tune and assess your remote storage are outputed by k6.
✓ write worked █ instant query high cardinality ✓ expected request status to equal 200 ✓ has valid json body ✓ expected status field to equal 'success' ✓ expected data.resultType field to equal 'vector' █ range query ✓ expected request status to equal 200 ✓ has valid json body ✓ expected status field is 'success' to equal 'success' ✓ expected resultType is 'matrix' to equal 'matrix' █ instant query low cardinality ✓ expected request status to equal 200 ✓ has valid json body ✓ expected status field to equal 'success' ✓ expected data.resultType field to equal 'vector' checks............................................................................: 100.00% ✓ 1454 ✗ 0 ✓ { type:read }...................................................................: 0.00% ✓ 0 ✗ 0 ✓ { type:write }..................................................................: 100.00% ✓ 6 ✗ 0 data_received.....................................................................: 1.0 MB 8.4 kB/s data_sent.........................................................................: 277 kB 2.3 kB/s group_duration....................................................................: avg=64.61ms min=39.94ms med=60.43ms max=230.05ms p(90)=80.39ms p(95)=107.93ms http_req_blocked..................................................................: avg=4.65ms min=2µs med=6µs max=96.84ms p(90)=11µs p(95)=58.42ms http_req_connecting...............................................................: avg=1.31ms min=0s med=0s max=21.87ms p(90)=0s p(95)=16.99ms http_req_duration.................................................................: avg=53.7ms min=34.23ms med=52.71ms max=164.1ms p(90)=67.02ms p(95)=71.82ms { expected_response:true }......................................................: avg=53.7ms min=34.23ms med=52.71ms max=164.1ms p(90)=67.02ms p(95)=71.82ms ✓ { type:read }...................................................................: avg=53.8ms min=34.23ms med=52.76ms max=164.1ms p(90)=66.85ms p(95)=71.62ms ✓ { url:https://admin:[email protected]/api/v1/push }...: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s http_req_failed...................................................................: 0.00% ✓ 0 ✗ 368 http_req_receiving................................................................: avg=92.34µs min=32µs med=89µs max=301µs p(90)=125.3µs p(95)=150µs http_req_sending..................................................................: avg=49.05µs min=12µs med=40µs max=566µs p(90)=68µs p(95)=94.59µs http_req_tls_handshaking..........................................................: avg=3.11ms min=0s med=0s max=54.28ms p(90)=0s p(95)=39.39ms http_req_waiting..................................................................: avg=53.56ms min=33.94ms med=52.56ms max=163.93ms p(90)=66.88ms p(95)=71.66ms http_reqs.........................................................................: 368 3.064697/s iteration_duration................................................................: avg=64.88ms min=40.34ms med=60.78ms max=230.27ms p(90)=80.87ms p(95)=108.47ms iterations........................................................................: 368 3.064697/s vus...............................................................................: 26 min=26 max=26 vus_max...........................................................................: 26 min=26 max=26
What a time saver? With k6 we have been able to efficiently assess all remote storage solutions. This is a significative improvement if we compare it to our previous benchmarking plan.
The next and final post will be about which remote storage we have chosen to be our internal solution.
Stay tuned.
After 10 years as a Sysadmin in High Performance Computing, Wilfried Roset is now part of OVHcloud as Engineering Manager for their Databases product Unit. He focuses on industrialization, reliability and performances for both internal and public clusters offers.