We are looking for a Senior Site Reliability Engineer (SRE) with a background in software engineering and database engineering to join our growing SRE team. This role is ideal for engineers who are passionate about building scalable systems, automating operational processes, and ensuring system availability, and performance across complex distributed services.
As a Senior SRE, you will help design and implement database infrastructure solutions that support our production environments, improve deployment pipelines, and ensure seamless application delivery. Your unique blend of development expertise and database experience will be necessary for overseeing projects across reliability, observability, and performance tuning. Candidate will report to Sr. Manager, Reliability Engineering.
Design scalable and resilient database infrastructure for mission-critical systems.
Maintain CI/CD pipelines and automated operational processes using tools like Gitlab, Terraform.
Implement observability best practices including logging, monitoring, tracing, and alerting (e.g., Prometheus, Grafana, Loki).
Collaborate with development teams to ensure system designs are scalable, maintainable, and secure.
Manage and increase relational and non-relational databases (e.g., PostgreSQL, MySQL, MongoDB , Snowflake & Kafka) with a focus on high availability and performance tuning.
Lead root cause analysis and postmortems for major incidents; promote long-term reliability improvements and contribute to internal tooling, automation framework and infrastructure-as-code.
Good Exposure on Frontend: Angular , ReactJS , Backend: Python Flask & General Skills: UI/UX Design Principles, Version Controls.
Participate in database on-call rotations to respond to system incidents, ensure uptime Service level agreements are met and promote DB SRE best practices across teams.
8+ years of experience in Software engineering, DevOps, or DB SRE roles.
Programming experience with Angular , ReactJS & Python Flask.
Experience in database engineering: Schema design, Query optimization, replication, and backup/restore strategies and other Database Administration tasks.
Expertise with containerization (Docker) and orchestration platforms (Kubernetes).
Experience with distributed systems, networking, and cloud-native architectures (AWS & GCP)
Familiarity with security practices related to infrastructure and data handling.
Experience with infrastructure-as-code tools (Terraform, etc.).
Experience building scalable, resilient, and observable distributed systems and work independently.
Keyskills: Site Reliability Engineering performance tuning PostgreSQL UI/UX Design Principles Python Flask Kafka bootstrap Angular DevOps Terraform ReactJS MySQL Snowflake CI/CD MongoDB Python Kubernetes
If youre thinking scale, think bigger and dont stop there. At Walmart Global Tech India, we dont just innovate, we enable transformations across stores and different channels for the Walmart experience. \\r\\n \\r\\nA regular day at Walmart Global Tech India means using technology to deliver leadin...