Latency-based SLOs!
complete
Jean Sandberg
complete
Latency-based SLOs are now available! If your Dive agent is older than v0.0.18, you'll need to update it. Just go to Clusters and click the update link.
William Morgan
in progress
We've started work on this.
William Morgan
For those of you who are interested in this feature: how exact do you need these to be? If you set an SLO like, "92% of all requests must be < 123ms", is it important Dive to be exact up to the level of a single request?
(Success rate SLOs are exact, today, but it's a bit harder to do this for latency, for complicated reasons.)
Dario
William Morgan: for us, ideally we'd be able to specify a window size too. For example, "90% of requests must be < 200 ms in every 5 minutes period". So there are 3 parameters: the percentile, the latency threshold and the window size. The window size is important because some endpoints don't get as many requests as others and in that case we need to increase the window size to have meaningful alerts. Hope that makes sense
William Morgan
Dario: Dive's SLOs are event-based, meaning that there is no reason to specify a calculation window size. E.g. if the SLO is "92% of all requests for the quarter must be < 123ms", that is calculated regardless of event frequency or volume. Hope that makes sense.
Dario
William Morgan: that makes sense. Actually that's probably even better, saves us from specifying another parameter that isn't really meaningful. Would the SLOs be specified on a per-service basis or on a per-endpoint basis?
William Morgan
Dario: Definitely want per-endpoint in the long run, but that will require some work to integrate service profiles into Dive. Per-service in the short term.
Dario
This would be very useful. It would be great if thresholds could be defined in code, maybe in the service profile?
William Morgan
Dario: Yes we're going to allow pretty much everything on these pages to be defined outside of Dive, either as on-cluster resources or by pulling things directly from e.g. GitHub.
Andrew Waters
FYI I've filed a related (but not duplicate) feature request over at https://community.dive.co/feature-requests/p/sli-flexibility
William Morgan
planned