Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Lead Site Reliability Engineer @ Software Company

Home > Devops

 Lead Site Reliability Engineer

Job Description

Hiring, Lead Site Reliability Engineer with following skills and expertise.

What will this person do?

  • Provide leadership in designing and implementing reliable, scalable, and secure infrastructure solutions.
  • Develop and maintain observability solutions, ensuring visibility into system performance using native Azure Cloud solutions.
  • Define and track SLIs, ensuring compliance with SLOs and SLAs.
  • Lead incident response efforts, conduct root cause analysis, and implement preventive measures to minimize downtime.
  • Automate infrastructure provisioning, configuration and management using Terraform & Ansible.
  • Build and maintain robust Observability pipelines to support automated deployments and continuous monitoring practices.
  • Continuously analyze system health and optimize performance by identifying and resolving bottlenecks.
  • Work with our BCDR team to minimize business impact during failures and measure the quality of services.
  • Work with Cloud Governance team to monitor cloud infrastructure spending and implement cost-saving strategies.
  • Implement centralized logging, metric collection, and distributed tracing for troubleshooting and debugging.
  • Deploy, Manage and Monitor containerized workloads.
  • Maintain configuration consistency and compliance across cloud environments using tools like Ansible.
  • Partner with software development teams to integrate reliability best practices into the application development lifecycle.
  • Conduct detailed post-mortems, document learnings, and drive improvements to reduce future incidents.
  • Develop automation scripts in Python, Bash, or other languages to reduce manual efforts and improve efficiency.
  • Provide mentorship to junior engineers, fostering a culture of learning and continuous technical growth.
  • Research and evaluate new technologies, tools, and methodologies to improve system reliability and efficiency.
  • Maintain detailed documentation on infrastructure, monitoring setups, incident responses, and best practices.

Qualifications


  • Bachelors degree in Computer Science, Engineering, or a related field.
  • 10+ years in Observability, DevOps, and Site Reliability Engineering (SRE).
  • At least 2 years of experience in defining Observability KPIs for both on-premises and cloud environments.
  • Strong experience with cloud platforms (AWS, Azure, GCP) and cloud-native technologies.
  • Passion for automation, reducing toil and implementing reliability-focused best practices.
  • Deep knowledge of services/tools like Grafana, PowerBI, Prometheus, Azure Monitor, Application Insights & Azure Metrics.
  • Expertise in Terraform, Ansible, Chef, and CI/CD pipeline tools like GitHub Actions, Jenkins, and GitOps methodologies.
  • Working understanding of load balancing, authentication (AAA), encryption, and network parameters monitoring.
  • Strong troubleshooting skills and experience handling on-call incidents and post-mortem analysis.
  • Ability to work cross-functionally, drive technical discussions, and mentor junior engineers.
  • Ability to work in a dynamic team environment and possess time management skills to meet deadlines.
  • Sense of ownership and pride in your performance and its impact on the companys success.
  • Critical thinker with problem-solving skills.
  • Good interpersonal and communication skills.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Software Company
Location(s): Bengaluru

+ View Contactajax loader


Keyskills:   Devops Cloud Site Reliability Engineering Microsoft Azure Observability SRE Prometheus Ci/Cd Load Balancing Grafana Terraform PowerBI GCP Onpremise AWS

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

SW Dev Ops Engineer IV

  • NCR Corporation
  • 10 - 15 years
  • Hyderabad
  • 9 hours ago
₹ Not Disclosed

Software Dev Ops Engineer IV

  • NCR Corporation
  • 1 - 4 years
  • Hyderabad
  • 3 days ago
₹ Not Disclosed

DevOps Engineer - I

  • Increff
  • 0 - 3 years
  • Bengaluru
  • 1 month ago
₹ Not Disclosed

Devops Engineer

  • Tekpillar
  • 4 - 9 years
  • Noida, Gurugram
  • 2 hours ago
₹ -17 Lacs P.A.

Software Company

The client is in the field of cyber security, embedded systems, high-performance computing and IOT. Work with the Banking Industry, Defence and Governments to help them secure their Digital identity and Transactions