Response code for latency based SLO?
Andrew Seigner
Currently latency-based SLOs are calculated irrespective of http response code or success status. We'd like to make this more flexible. Are there particular queries or options you'd like to see?
Re: a service being down, it depends. If it's still reporting metrics, and it is receiving traffic but just failing requests, we will show 0% success rate. If it's receiving 0 requests, or if metrics are not being collected, we will show "--" for success rate.
For reference, Success Rate SLI calculations are essentially:
sum(
rate(
response_total{direction="inbound", classification="success"}[1m]
)
)
/
sum(
rate(
response_total{direction="inbound"}[1m]
)
)
...and for Latency:
sum(
rate(
response_latency_ms_bucket{direction="inbound"}[1m]
)
) by (le)
Jerome Comptdaer
Andrew Seigner:
Thank you for the provided details.
Overall I would like the SLO to capture the actual user experience as close as possible. For my use case, I see this as a combination of both successful responses (<500) and latency threshold.
Jean Sandberg
Jerome Comptdaer: I think you're saying you want to know the % of requests that are both successful and meet a latency threshold?