Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer (SRE) - Observability & Azure Infrastructure @ Keka Technologies

Home > Devops

 Site Reliability Engineer (SRE) - Observability & Azure Infrastructure

Job Description

About the Role

We are looking for a highly skilled Site Reliability Engineer (SRE) to lead the implementation and management of our observability stack across Azure-hosted infrastructure and .NET Core applications. This role will focus on configuring and managing Open Telemetry, Prometheus, Loki, and Tempo, along with setting up robust alerting systems across all services including Azure infrastructure and MSSQL databases.

You will work closely with developers, DevOps, and infrastructure teams to ensure the performance, reliability, and visibility of our .NET Core applications and cloud services.


Key Responsibilities Observability Platform Implementation:

  • Design and maintain distributed tracing, metrics, and logging using OpenTelemetry, Prometheus, Loki, and Tempo.
  • Ensure complete instrumentation of .NET Core applications for end-to-end visibility. o Implement telemetry pipelines for application logs, performance metrics, and traces.
  • Monitoring & Alerting:
  • Develop and manage SLIs, SLOs, and error budgets.
  • Create actionable, noise-free alerts using Prometheus Alertmanager and Azure Monitor. o Monitor key infrastructure components, applications, and databases with a focus on reliability and performance. Azure & Infrastructure Integration:
  • Integrate Azure services (App Services, VMs, Storage, etc.) with the observability stack. o Configure monitoring for MSSQL databases, including performance tuning metrics and health indicators. o Use Azure Monitor, Log Analytics, and custom exporters where necessary.
  • Automation & DevOps:
  • Automate observability configurations using Terraform, PowerShell, or other IaC tools.
  • Integrate telemetry validation and health checks into CI/CD pipelines.
  • Maintain observability as code for repeatable deployments and easy scaling. Resilience & Reliability Engineering:
  • Conduct capacity planning to anticipate scaling needs based on usage patterns and growth.
  • Define and implement disaster recovery strategies for critical Azure-hosted services and databases.
  • Perform load and stress testing to identify performance bottlenecks and validate infrastructure limits.
  • Support release engineering by integrating observability checks and rollback strategies in CI/CD pipelines.
  • Apply chaos engineering practices in lower environments to uncover potential reliability risks proactively. Collaboration & Documentation:
  • Partner with engineering teams to promote observability best practices in .NET Core development. o Create dashboards (Grafana preferred) and runbooks for system insights and incident response. o Document monitoring standards, troubleshooting guides, and onboarding materials.

Required Skills and Experience

  • 4+ years of experience in SRE, DevOps, or infrastructure-focused roles.
  • Deep experience with .NET Core application observability using OpenTelemetry.
  • Proficiency with Prometheus, Loki, Tempo, and related observability tools.
  • Strong background in Azure infrastructure monitoring, including App Services and VMs.
  • Hands-on experience monitoring MSSQL databases (deadlocks, query performance, etc.). Familiarity with Infrastructure as Code (Terraform, Bicep) and scripting (PowerShell, Bash).
  • Experience building and tuning alerts, dashboards, and metrics for production systems.

Preferred Qualifications

  • Azure certifications (e.g., AZ-104, AZ-400).
  • Experience with Grafana, Azure Monitor, and Log Analytics integration.
  • Familiarity with distributed systems and microservice architectures.
  • Prior experience in high-availability, regulated, or customer-facing environments.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Keka Technologies
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Azure Terraform Tempo AZ-400 Loki Prometheus .Net OpenTelemetry

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Aws Devops Engineer

  • Capgemini
  • 7 - 10 years
  • Bengaluru
  • 14 hours ago
₹ Not Disclosed

Devops Engineer

  • Airtel
  • 1 - 3 years
  • Pune
  • 16 hours ago
₹ -6 Lacs P.A.

System Software Engineer

  • Orange Business
  • 3 - 6 years
  • Noida, Gurugram
  • 16 hours ago
₹ Not Disclosed

Devops Engineer

  • Taluncrunch Advisory
  • 3 - 8 years
  • Bengaluru
  • 23 hours ago
₹ Not Disclosed

Keka Technologies

keka Technologies Pvt Ltd About Us: Keka has grown super-fast to become the leading HR Tech product, thanks to our people and customers. We are here to transform businesses in India by empowering HR and employees with right tools, so they can focus on doing their best. We are unstoppable and ar...