Experience: 8+ yrs
Location: Bengaluru
Roles and Responsibilities
1. On Call responsibilities to help minimize MTTD and MTTR
2. Experience with containerization and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere)
3. Should have skills to understand debugging info , Drain traffic away from a cluster, Rollback a bad software push , block or rate limiting unwanted traffic, bring up additional serving capacity thru autoscaling features and use the monitoring systems(for alerting and dashboards)
4. Engage with enterprise and business/infrastructure functions to establish, track, and optimize operational metrics and targets in line with SRE principles (SLO/SLI, Latency percentiles , error budgets, tech debt and setup alert guidelines )
5. Work with Observability tools and enterprise monitoring solutions like Dynatrace, AppDynamics, New Relic, Prometheus, Graphite, Grafana, Nagios, Sensu and Splunk . Should be able to write promQLs and Splunk queries .
6. Programming/Tooling and Automation experience in one or more of the following languages: Golang, Java, Python, Typescript, Node and Shell .
7. Good understanding of Kafka internals , SQL/noSQL databases like Cassandra , Elasticsearch and Postgress and In-Memory Caching frameworks like Memcached .
8. Influence, design and create new architectures, standards, and methods for large-scale enterprise systems.
9. Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce/Retail and Enterprise products.
10. Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
11. Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
12. Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products
13. Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance.
14. Secure the system from issues, be they real, perceived, or notional.
Please share your updated resume along with below details at as************i@co****e.com
Exp:
Notice Period:
Current CTC:
ECTC:
Location:
Keyskills: SRE DevOps Container Site Reliability Engineering
Coforge is a leading global IT solutions organization, enabling its clients to transform at the intersect of unparalleled domain expertise and emerging technologies to achieve real-world business impact. A focus on very select industries, a detailed understanding of the underlyin...