At Engineering Platform - Site Reliability Engineering, Gojek; we are looking for passionate engineers to join us in improving, and managing Gojeks engineering productivity, reliability, and observability across the board. The platform you ll work on is designed to power diverse applications across Gojek s many business lines. You get to work with an insanely driven and proud team of engineers who deliver fundamental functionality to enable multiple product groups at Gojek to deal with scenarios at a really interesting combination of scale and complexity. You are directly responsible for improving engineering quality and productivity and experience of engineers driving fundamental businessKPI for the company. If you enjoy creating tools and automating processes and are comfortable dealing with high scale and complex distributed systems - this role will be a great fit.
What you will do
Cloud Administration
Hands-on in administering cloud-based infrastructure deployment which includes tasks such as provisioning of resources, user administration, monitoring computing resource utilization, network setup, backup/restore, and incident management.
Automation
Hands-on in designing and building SRE tooling to automate monitoring, incident response, and alerting to reduce time-consuming functions that are still necessary.
Proficient to build and improve CI/CD tooling to automate and streamline deployments Proficient in design and build of GitOps practise for infrastructure management DevOps
Proficient in CI/CD tools like GitLab CI/CD, Jenkins, and CircleCI, with experience in infrastructure automation using Terraform, Ansible, and CloudFormation.
K8s Administration
Hands-on experience in deploying and managing applications on Kubernetes, with knowledge of pod and container lifecycle management, service and ingress resource management, and persistent storage solutions.
Skills in Kubernetes networking concepts, including services, ingress controllers, and network policies, with a focus on scalability, high availability, and security.
IaC on Cloud:
Experience in infrastructure provisioning and management using Infrastructure as Code (IaC) tools like Terraform, Terragrunt, CloudFormation, and Azure Resource Manager (ARM).
Knowledgeable in IaC best practices, including version control, testing, and continuous integration/continuous deployment (CI/CD) pipelines for infrastructure code.
Networking
Proficient with Cloud Load Balancers, Cloud Networking, Wireless (Aruba)Build and manage Cloud product features for Enhanced Networking like VPC, API Gateway, CloudFront, Route 53, Cloud WAN, Direct Connect, PrivateLink, Transit Gateway, Elastic Load Balancing (ELB), etc.
What you will need
10+ years of experience in SRE or DevOps space (at least 8+ in a large enterprise Cloud)
Experience maintaining and operating large-scale applications in cloud platforms such as AWS or GCP is a must-have.
Strong hands-on experience in Kubernetes is a must-have.
Deep knowledge of Linux as a production environment, and container technologies. e.g. Docker.
Ability to automate repetitive tasks and familiarity with scripting languages.
Strong understanding of infrastructure-as-code principles and best practices such as Terraform
Solid understanding of networking concepts and protocols.
Understanding of microservices architecture, event-driven architecture, Chef/Ansible and CI/CDStrong technical aptitude including excellent troubleshooting and communication skills.
Job Classification
Industry: InternetFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time