Location NCR MumbaiBangalore Chennai
Job Description SRE Architect
The SRE Architect will play a critical role in designing and implementing Observable Scalable Reliable and Resilient systems and applications that ensure the highest levels of availability and performance for the applications and services This role requires a deep understanding of software engineering system architecture and operations along with a passion to automate repetitive tasks with GenAI tools and scripts
Key Responsibilities
System Design and Architecture Lead the design and architecture of scalable and reliable systems that meet the needs of our growing user base and business requirements
Automation and Tooling Develop and maintain automation tools and frameworks that streamline operations and improve system reliability
Monitoring and Observability Implement and enhance monitoring logging and alerting systems to ensure proactive detection and resolution of issues
Capacity Planning Conduct capacity planning and performance tuning to ensure systems can handle current and future demands
Incident Management Lead incident response efforts perform root cause analysis and implement corrective actions to prevent recurrence
Collaboration and Mentorship Work closely with software engineers DevOps and other stakeholders to promote best practices in reliability engineering and provide mentorship to junior team members
Continuous Improvement Identify areas for improvement in existing systems and processes and drive initiatives to enhance system reliability and performance
Skillset
Experience Overall 14 years of experience along with minimum of 7 years of experience in site reliability engineering DevOps or a related field with a proven track record of designing and implementing reliable systems at scale
Technical Skills
Strong programming skills in languages such as Python Go or JavaNet
Indepth knowledge of cloud platforms AWS GCP Azure and container orchestration Kubernetes Docker
Experience with infrastructure as code Terraform Ansible Puppet
Proficiency in monitoring and observability tools Prometheus Grafana Splunk ELK stack
Solid understanding of networking security and system performance tuning
Soft Skills
Strong problemsolving and analytical skills
Excellent communication and collaboration abilities
Ability to work in a fastpaced environment and manage multiple priorities
Passion for continuous learning and staying uptodate with industry trends and technologies
Preferred Skillset
Experience with chaos engineering and resilience testing
Familiarity with service mesh architectures Istio Linkerd
Certifications in cloud platforms Azure Certified Architect AWS Certified Architect Google Cloud Professional Architect etc.
Location - Chennai/Bangalore/Hyderabad/Mumbai/Pune/Kolkata/Delhi/Noida
Keyskills: Sre Terraform Ansible Site Reliability Engineering Kubernetes elk Prometheus Grafana Elk Cluster Azure Cloud GCP Splunk AWS
[NSE: LTIMindtree] is a global technology consulting and digital solutions LTIMindtree company that enables enterprises across industries to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies. As a digital transformation partner to more than 75...