Lead and manage the RCA process for all SRE incidents, ensuring athorough and timely investigation.
Facilitate RCA workshops, guiding teams through a structured analysis to identify the root cause of incidents.
Document RCA findings and recommendations in a clear and concisemanner.
Work with SRE engineers and developers to implement correctiveactions and preventative measures based on RCA findings.
Analyze trends in incident data to identify areas for improvement insystem design, monitoring, and automation.
Develop and implement best practices for RCA within the SREorganization.
Stay up-to-date on the latest SRE practices and incident response methodologies.
Collaborate with other teams (e.g., security, product) to ensure aholistic approach to incident management.
Mentor and coach SRE engineers on effective RCA techniques.
Track and report on key metrics related to incident management and RCAeffectiveness.
Keyskills: RCA Management incident management RCA workshops incident response methodologies structured analysis RCA techniques
Cloud Kinetics is a premier provider of digital solutions. We enable enterprises, service providers, and ISVs to drive their business objectives with minimal dependence on infrastructure elements. We offer unique platform-driven services aimed towards accelerating customers’ business tran...