Define and enforce SLOs, SLIs, and error budgets across microservices
Architect an observability stack (metrics, logs, traces) and drive operational insights
Automate toil and manual ops with robust tooling and runbooks
Own incident response lifecycle: detection, triage, RCA, and postmortems
Collaborate with product teams to build fault-tolerant systems
Champion performance tuning, capacity planning, and scalability testing
Optimise costs while maintaining the reliability of cloud infrastructure
Must-have skills
6+ years in SRE/Infrastructure/Backend related roles using Cloud Native Technologies
2+ years in SRE-specific capacity
Strong experience with monitoring/observability tools (Datadog, Prometheus, Grafana, ELK etc.)
Experience with infrastructure-as-code (Terraform/Ansible)
Proficiency in Kubernetes, service mesh (Istio/Linkerd), and container orchestration
Deep understanding of distributed systems, networking, and failure domains
Expertise in automation with Python, Bash, or Go
Proficient in incident management, SLAs/SLOs, and system tuning
Hands-on experience with GCP(preferred)/AWS/Azure and cloud cost optimisation
Participation in on-call rotations and running large-scale production systems
Nice to have skills:
Familiarity with chaos engineering practices and tools (Gremlin, Litmus)
Background in performance testing and load simulation (Gatling, Locust, k6, JMeter)
Why us?
You will be working with a lean team of passionate and talented individuals. We know that working with like-minded people is important.
We are on a mission to supercharge brick-and-mortar retail stores in the era of e-commerce. Our customers give us confidence in our journey, and you will have a huge impact with your wor.k
You will be free to experiment and can choose to do things differently.
Lastly, we deeply care about a culture of being a solver. Come, be one with us!
Equal opportunity employer
Grey Orange Inc. is an equal employment opportunity employer. The company s policy is not to discriminate against any applicant or employee based on race, color, religion, national origin, gender, age, sexual orientation, gender identity or expression, veteran status, marital status, mental or physical disability, and genetic information, or any other basis protected by applicable law. Grey Orange also prohibits harassment of applicants or employees based on any of these protected categories.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: Software DevelopmentRole: Technical ArchitectEmployement Type: Full time