Support for Batch jobs
under review
William Morgan
Neeran Gul A couple questions for you as we dig into this:
- Do you have multiple jobs in your environment? If so, how do you distinguish them? Do you give them unique names in the metadata:section of the job spec, or something else?
- Do you use CronJobs or just plain old Jobs? If the latter, how are they triggered?
- In terms of an SLI: how do you currently determine the success of a job? Specifically, is there a measure beyond Kubernetes's measure of "pod successfully completed" that you use?
- When you want to make a change to the job's code, how does that work? Does it go through CI and end up putting an image into a container registry, or something else?
Thanks!
Neeran Gul
William Morgan:
- Yes multiple jobs distinguished by metadata section mainly name and namespace fields, some teams use labels too.
- We have a mix of CronJobs and plain jobs. Weirdly enough the Jobs are triggered via a Helm release on demand.
- We measure SLI by successfully completed and specific metrics which are the output of the job. Mainly looking at an s3 store or db. During the job duration the job talks to databases (k8s svcs) and micro services (k8s svcs), it would be good to drill down into how successful it was talking to these as it will effect the SLI.
- For cronjobs its mainly a helm release upgrade, for standard jobs we delete the helm release and re-install it.
William Morgan
under review