Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer (SRE) @ Cloud Angles Digital

Home > Devops

 Site Reliability Engineer (SRE)

Job Description

Job Summary

Site Reliability Engineers (SRE's) cover the intersection of Software Engineer and Systems Administrator. In other words, they can both create code and manage the infrastructure on which the code runs. This is a very wide skillset, but the end goal of an SRE is always the same: to ensure that all SLAs are met, but not exceeded, so as to balance performance and reliability with operational costs.

As a Site Reliability Engineer II, you will be learning our systems, improving your craft as an engineer, and taking on tasks that improve the overall reliability of the VP platform.


Key Responsibilities:

  • Design, implement, and maintain robust monitoring and alerting systems.
  • Lead observability initiatives by improving metrics, logging, and tracing across services and infrastructure.
  • Collaborate with development and infrastructure teams to instrument applications and ensure visibility into system health and performance.
  • Write Python scripts and tools for automation, infrastructure management, and incident response.
  • Participate in and improve the incident management and on-call process, driving down Mean Time to Resolution (MTTR).
  • Conduct root cause analysis and postmortems following incidents and champion efforts to prevent recurrence.
  • Optimize systems for scalability, performance, and cost-efficiency in cloud and containerized environments.
  • Advocate and implement SRE best practices, including SLOs/SLIs, capacity planning, and reliability reviews.

Required Skills & Qualifications:

  • 1+ years of experience in a Site Reliability Engineer or similar role.
  • Excellent communicaiton skills in English.
  • Proficiency in Python for automation and tooling.
  • Hands-on experience with monitoring and observability tools such as Prometheus, Grafana, Datadog, New Relic, Open Telemetry, etc.
  • Experience with log aggregation and analysis tools like ELK Stack (Elasticsearch, Logstash, Kibana) or Fluentd.
  • Good understanding of cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
  • Familiarity with infrastructure-as-code (Terraform, Ansible, or similar).
  • Strong debugging and incident response skills.
  • Knowledge of CI/CD pipelines and release engineering practices.

Job Classification

Industry: Software Product
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Cloud Angles Digital
Location(s): Noida, Gurugram

+ View Contactajax loader


Keyskills:   Prometheus Datadog Grafana New Relic Python Open Telemetry Elk Ci/Cd Terraform Azure Cloud Fluentd Ansible GPC AWS Kubernetes

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Lead Site Reliability Engineer

  • Equifax Credit
  • 2 - 7 years
  • Pune
  • 8 hours ago
₹ Not Disclosed

DevOps Engineer

  • Think Future
  • 3 - 8 years
  • Noida, Gurugram
  • 22 hours ago
₹ Not Disclosed

Azure Cloud Devops Engineer

  • eSolutionsFirst
  • 10 - 18 years
  • Hyderabad
  • 2 days ago
₹ 15-30 Lacs P.A.

Devops Engineer

  • RWS Group
  • 4 - 5 years
  • Bengaluru
  • 2 days ago
₹ 18-22.5 Lacs P.A.

Cloud Angles Digital

CloudAngles.com