Proven expertise in SRE Observability Concepts and monitoring architecture design.
Extensive experience with alerting frameworks like Prometheus, Grafana, Kibana, Splunk, and Datadog.
Hands-on experience with alert noise reduction and advanced alerting techniques such as anomaly detection and burn rate alerting.
Strong proficiency in incident management, including analysis, root cause identification, and preventive measures.
Familiarity with payment monitoring systems and operational requirements.
Proficient in automation tools and scripting languages like Python or Java.
Excellent collaboration and communication skills to interact with cross-functional teams.
Flexibility to work in rotational 24x7 shifts from the office.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time