Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Senior Member Technical Staff - ITOps @ GreyOrange

Home > Devops

 Senior Member Technical Staff - ITOps

Job Description

We are seeking a talented and motivated Lead Site Reliability Engineer (SRE) to join our organisation. The SRE team at GreyOrange is responsible for monitoring the stability and availability of mission-critical production systems, managing incidents for quicker resolution, and establishing BAU. The team also manages and maintains internal tools/infra which is consumed by other development teams.
The experienced SRE will play a crucial role in ensuring the reliability, scalability, capacity planning, and performance of our infrastructure and applications. The ideal candidate will have a strong background in software engineering, system administration, containerization, and cloud technologies.
Requirements
Should have 7+ years of experience.
Well-versed with scripting/programming languages (Python/Bash/PowerShell, etc.) to automate manual work, particularly within cloud environments
Well-versed with Observability tools (Grafana, Splunk, Dynatrace) for monitoring, alerting, and logging solutions to identify and address potential issues, especially in cloud infrastructure
Working experience with automation tools (Jenkins, GitLab, Ansible/Chef for configuration management) and processes to streamline deployment, monitoring, and management of systems and applications in the cloud
Hands-on experience with containerization and orchestration technologies such as Docker, Kubernetes, or similar, particularly in cloud-native environments
Well aware of SLI, SLO, SLA, and Error Budget concepts and their implementations; provide on-call support and participate in incident management & response activities as needed
Expert with troubleshooting production issues and bugs.
Good knowledge of Unix systems, networking, web technologies, and databases.
Incident Management experience coupled with effective communication skills for production workload.
Working knowledge in any one of the cloud platforms (AWS or GCP)
What youll do:
Lead reliability engineering projects and drive them to closure.
Ensure system stability and high availability by proactively monitoring performance and troubleshooting issues
Design, build and maintain efficient, reliable, and scalable cloud-based infrastructure and services
Automate processes and find opportunities to improve the observability and availability of the Platform to reduce toil.
Implement and manage observability tools for comprehensive monitoring, alerting, and logging
Own end-to-end availability and performance of different services & tools.
Practice sustainable incident response and blameless postmortems.
Provide on-call support for incident management and participate actively in response activities

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: GreyOrange
Location(s): Noida, Gurugram

+ View Contactajax loader


Keyskills:   Unix Networking Powershell Configuration management Incident management Troubleshooting Monitoring Python System administration Capacity planning

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Application Developer - Cloud FullStack

  • IBM
  • 3 - 5 years
  • Hyderabad
  • 1 day ago
₹ Not Disclosed

Application Developer-Cloud FullStack

  • IBM
  • 6 - 8 years
  • Hyderabad
  • 2 days ago
₹ Not Disclosed

Application Developer-Cloud FullStack

  • IBM
  • 3 - 5 years
  • Bengaluru
  • 2 days ago
₹ Not Disclosed

Senior Devops Engineer

  • Mlogica
  • 8 - 13 years
  • Pune
  • 1 day ago
₹ Not Disclosed

GreyOrange

GreyOrange Pvt Ltd GreyOrange Pvt Ltd