Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Observability Engineer @ Data Economy

Home > Devops

 Observability Engineer

Job Description

":" Job Summary:

We are seeking an experienced Observability Engineer with a strong DevOps background to design, implement, and manage observability solutions across cloud and on-prem environments. The ideal candidate will have expertise in monitoring, logging, tracing, and alerting to ensure high system availability, performance, and reliability.
Key Responsibilities:

  • Design & Implement Observability Solutions : Develop and maintain monitoring, logging, and tracing solutions using industry-leading tools (Prometheus, Grafana, Datadog, New Relic, Splunk, etc.).
  • Performance Monitoring & Optimization : Ensure proactive identification and resolution of performance bottlenecks in distributed systems.
  • Logging & Tracing : Set up and manage centralized logging solutions (ELK/EFK stack, Fluentd, OpenTelemetry).
  • Alerting & Incident Management : Configure alerting mechanisms using tools like PagerDuty, Ops genie, or VictorOps for proactive issue detection.
  • SRE Practices : Implement Site Reliability Engineering (SRE) principles to enhance system reliability and reduce MTTR (Mean Time to Resolution).
  • Automation & Infrastructure as Code (IaC) : Automate observability setup and configurations using Terraform, Ansible, or similar tools.
  • Cloud & Kubernetes Monitoring : Implement observability best practices for cloud platforms (AWS, Azure, GCP) and containerized environments (Kubernetes, Docker).
  • Collaboration : Work closely with development, SRE, and operations teams to ensure end-to-end observability of applications and services.
  • Compliance & Security : Ensure logging and monitoring solutions adhere to security and compliance requirements.


    Requirements Required Skills & Qualifications:
    • 6-10 years of experience in DevOps, SRE, or Observability engineering.
    • Strong hands-on experience with observability tools like Prometheus, Grafana, New Relic, Datadog, Splunk, ELK/EFK, OpenTelemetry, AppDynamics, etc.
    • Experience in setting up distributed tracing solutions (Jaeger, Zipkin, OpenTelemetry).
    • Expertise in Kubernetes monitoring using Prometheus, Thanos, Loki, or similar tools.
    • Strong proficiency in scripting (Python, Bash, Shell) for automation.
    • Hands-on experience with Terraform, Ansible, Helm, or CloudFormation for infrastructure automation.
    • Proficiency in CI/CD pipelines and GitOps methodologies using Jenkins, GitLab CI, ArgoCD, or Flux.
    • Experience in public cloud environments (AWS, Azure, GCP) and monitoring cloud-native services.
    • Strong troubleshooting and root cause analysis (RCA) skills.
    • Understanding of SLIs, SLOs, and error budgets as part of SRE best practices.
    • Familiarity with log management, anomaly detection, and AI-based observability solutions is a plus.


      Benefits
      As per company standards.
      ","

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Data Economy
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   RCA Root cause analysis Automation GCP devops Cloud Troubleshooting Distribution system Monitoring Python

 Job seems aged, it may have been expired!
 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Site Reliability Engineer

  • Empowering Digital
  • 5 - 10 years
  • Hyderabad
  • 12 hours ago
₹ Not Disclosed

Devops Engineer

  • Teamware Solutions
  • 6 - 10 years
  • Chennai
  • 16 hours ago
₹ Not Disclosed

DevOps Engineer

  • Leading Client
  • 2 - 6 years
  • India
  • 17 hours ago
₹ Not Disclosed

DevOps Engineer

  • Leading Client
  • 2 - 6 years
  • Bengaluru
  • 17 hours ago
₹ Not Disclosed

Data Economy

DATAECONOMY