Job Description
F5 Inc. is actively seeking an exceptional Sr Principal Software Engineer (Individual Contributor) to play a pivotal role in our SRE Operations team for the groundbreaking F5XC Product.
We are looking for a top-tier SRE to drive Logs, Metrics, and Alerting , with a deep focus on Alerting automation at massive scale.
Why This Role is Unique:
Our SaaS is hybrid - running across public cloud and a global network of 50+ PoPs , delivering terabits of capacity . Our infrastructure spans cloud-native services and physical networking gear (routers, switches, firewalls), creating a uniquely challenging and exciting observability landscape. The Analytics & Observability platform will have deep reach across these layers , ensuring reliability, security, and performance at a massive scale.
What Youll Do:
Be the Force Behind Observability & Stability
- Drive end-to-end Observability (Logs, Metrics, and Alerts) across our hybrid SaaS stack , spanning cloud, edge, and physical network devices.
- Take ownership of Alerting strategy , cutting through noise while ensuring actionable, high-fidelity alerts.
- Implement intelligent automation to reduce operational toil and enhance real-time visibility.
Own & Automate Operations
- Design, build, and manage automation for self-healing infrastructure across cloud + global PoPs.
- Develop automation for Kubernetes, ArgoCD, Helm Charts, Golang-based services, AWS, GCP, Terraform .
- Improve networking observability , ensuring our routers, switches, and firewalls are monitored at scale.
- Continuously eliminate manual ops work through automation and platform improvements.
Lead Incident Response & Operational Excellence
- Participate in on-call rotations , ensuring rapid incident response across our cloud + edge stack.
- Drive incident response automation , reducing MTTR and increasing system resilience .
- Ensure security, compliance, and best practices in observability & automation .
Collaborate & Mentor
- Work closely with application teams, network engineers, and SREs to improve reliability and performance.
- Mentor junior engineers, fostering a culture of automation-first thinking and deep observability .
What Makes You a Great Fit
- Deep expertise in Logs, Metrics, and Alerting, with a strong focus on Alerting automation.
- Experience in hybrid SaaS environments spanning cloud-native and global infrastructure.
- Strong background in Kubernetes, Infrastructure-as-Code (Terraform), Golang, AWS/GCP, and networking observability.
- Proven track record of eliminating toil and improving operational efficiency through automation.
- Passion for deep observability, networking-scale analytics, and automation at the edge.
Must-Have:
- Observability & Alerting Expertise
- Strong experience with Logs, Metrics, and Alerts, with a focus on highfidelity alerting and automation. Automation & Infrastructure as Code
- Deep knowledge of Terraform, ArgoCD, Helm, Kubernetes, and Golang for automation. Cloud & Hybrid SaaS Experience
- Handson experience managing cloudnative (AWS/GCP) and edge infrastructure. Incident Response & Reliability Engineering
- Strong oncall experience, with a track record of reducing MTTR through automation Kubernetes Mastery
- Handson experience deploying, managing, and troubleshooting Kubernetes in production environments.
Nice-to-Have:
- Networking & Edge Observability - Familiarity with monitoring routers, switches, and firewalls in a global PoP environment
- Data & Analytics in Observability - Experience with time-series databases (Prometheus, Grafana, OpenTelemetry, etc)
- Security & Compliance Awareness - Understanding of secure-by-design principles for monitoring & alerting
- Mentorship & Collaboration - Ability to mentor junior engineers and work cross-functionally with SREs, application teams, and network engineers
- High Availability / Disaster Recovery: Experience with HA/DR and Migration
Qualifications
- Typically, it requires at least 18 years of related experience with a bachelor s degree, 15 years and a master s degree, or a PhD with 12 years experience; or equivalent experience.
- Excellent organizational agility and communication skills throughout the organization.
Environment
- Empowered Work Culture: Experience an environment that values autonomy, fostering a culture where creativity and ownership are encouraged.
- Continuous Learning: Benefit from the mentorship of experienced professionals with solid backgrounds across diverse domains, supporting your professional growth.
- Team Cohesion: Join a collaborative and supportive team where youll feel at home from day one, contributing to a positive and inspiring workplace.
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time
Contact Details:
Company: F5 Networks, Inc
Location(s): Bengaluru
Keyskills:
Automation
Operational excellence
Networking
GCP
Disaster recovery
Cloud
Troubleshooting
Operations
Monitoring