Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Senior Site Reliability Engineer @ GreyOrange

Home > Devops

 Senior Site Reliability Engineer

Job Description

The SRE team at GreyOrange is responsible for monitoring the stability and availability of

mission-critical production systems, managing incidents for quicker resolution, and

establishing BAU. The team also manages and maintains internal tools/infra which is

consumed by other development teams.

The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity

planning, and performance of our infrastructure and applications. The ideal candidate will

have a strong background in software engineering, system administration, containerization,

and cloud technologies.


Requirements

  • Should have 6 to 11 years of experience
  • Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments
  • Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure
  • Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud
  • Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments
  • Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed
  • Expert with troubleshooting production issues and bugs.
  • Good knowledge of Unix systems, networking, web technologies, and databases.
  • Incident Management experience coupled with effective communication skills for production workload.
  • Working knowledge in any one of the cloud platforms (AWS or GCP)

What you'll do?

  • Lead reliability engineering projects and drive them to closure.
  • Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues
  • Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services
  • Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil.
  • Implement and manage observability tools for comprehensive monitoring, alerting, and logging
  • Own end-to-end availability and performance of different services & tools.
  • Practice sustainable incident response and blameless postmortems.
  • Provide on-call support for incident management and participate actively in response activities

Job Classification

Industry: Analytics / KPO / Research
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: GreyOrange
Location(s): Noida, Gurugram

+ View Contactajax loader


Keyskills:   Devops Jenkins Terraform Docker SRE Ansible Kafka Site Reliability Engineering Devops Engineer Ci/Cd Kubernetes

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Devsecops Engineer

  • Quest Diagnostics
  • 7 - 11 years
  • Hyderabad
  • 1 day ago
₹ Not Disclosed

Devops Site Reliability Engineer

  • Lotusflare
  • 4 - 8 years
  • Pune
  • 1 day ago
₹ Not Disclosed

DevOps Engineer

  • InfoVision Inc
  • 5 - 7 years
  • Pune
  • 1 day ago
₹ Not Disclosed

Devops Site Reliability Engineer

  • Lotusflare
  • 4 - 8 years
  • Pune
  • 1 day ago
₹ Not Disclosed

GreyOrange

GreyOrange Pvt Ltd GreyOrange Pvt Ltd