Job Title: Site Reliability Engineer (SRE) AWS
Location: [ Chennai ]
Job Type: [Full-time]
Job Summary:
We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with deep expertise in Amazon Web Services (AWS) to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure and services. You will work closely with development, operations, and security teams to build and maintain robust systems that support our business goals.
Key Responsibilities:
Design, implement, and maintain scalable, resilient, and secure AWS infrastructure.
Develop automation tools and frameworks for deployment, monitoring, and operations.
Monitor system performance, availability, and reliability using tools like CloudWatch, Prometheus, Grafana, etc.
Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or AWS CDK.
Collaborate with development teams to improve system architecture and application reliability.
Manage incident response, root cause analysis, and post-mortem documentation.
Optimize cost and performance of AWS resources.
Ensure compliance with security and governance policies.
Participate in on-call rotations and proactively address system alerts and outages.
Required Qualifications:
Bachelors degree in Computer Science, Engineering, or related field.
7+ years of experience in SRE, DevOps, or Cloud Engineering roles.
Strong hands-on experience with AWS services (EC2, S3, RDS, Lambda, ECS/EKS, etc.).
Proficiency in scripting languages (Python, Bash, etc.).
Experience with CI/CD tools (Jenkins, GitLab CI, AWS CodePipeline).
Familiarity with containerization and orchestration (Docker, Kubernetes).
Solid understanding of networking, security, and system administration.
Excellent problem-solving and communication skills.
Preferred Qualifications:
AWS certifications (e.g., AWS Certified DevOps Engineer, Solutions Architect).
Experience with observability tools (Datadog, New Relic, ELK Stack).
Knowledge of chaos engineering and reliability testing practices.
Experience in a high-availability, mission-critical environment.
Keyskills: Site Reliability Engineering AWS Sre
We are a software development services company with thought leadership in engineering digital solutions. We enable your enterprise to be more engaging, insightful, predictive, and efficient by adopting the technology advancements of the digital revolution and by supporting you from ideati...