Palo Alto Networks is looking for a talented Site Reliability Engineer for our ever expanding Cloud Operations. The ideal candidate enjoys working in a fast-paced environment with highly innovative technologies. Our team partners closely with IT and Engineering groups and requires individuals to bring a can-do, positive attitude, with a focus on delivering exceptional customer support.
Your Impact
Implementing and supporting the Linux infrastructure as code where our globally distributed customer-facing platform runs.
Provision, configure support resilient hybrid cloud deployment architecture using the automation framework and make it more efficient
Manage Linux infrastructure CI/CD platform, work with other SREs in deploying and maintaining automation framework, capacity planning, create and review PKI operational runbooks.
Manage scalability, capacity planning, redundancy, and resiliency.
Maintain service availability and performance SLAs based on business and product requirements.
Contribute to documentation related to design, deployment, validation, operations and DR/BCP.
Design proactive service monitoring, alerting and trend analysis of underlying infrastructure, and support the operations team in implementation.
Build and operate compute fabric for 1000s of VMs, Kubernetes Clusters. Develop scripts, build tools and write code to automate routine tasks.
Provide technical support to platform users
Respond to security implementation and audits of the environment.
Plan maintenance windows, write up change requests, present technical updates.
Participate in On-Call support including participating in RCA as required.
Your Experience
Strong hands-on Linux experience in managing and supporting Linux server infrastructure in CentOS/RHEL/Ubuntu.
Bachelors/Masters degree in Computer Science, Information Technology or technical stream with the equivalent combination of work experience required.
Design and performance tuning for Linux infrastructure and API, in-depth knowledge of multi-tier web applications.
Experience in developing and managing APIs, understanding of API infrastructure optimization and security.
In-depth knowledge of Certificate Lifecycle Management
Fluent in Linux security system hardening, vulnerability management patching process. Familiarity with CIS compliance levels.
Must be comfortable with Ansible, Chef or similar configuration management tool to manage infrastructure as code and source code control systems such as GIT or SVN.
Ability to work cross-functionally across multiple business units, such as product development and engineering
Must be able to collaborate with a global team spread across multiple time zones.
Passion, drive, energy, a sense of humour and a great attitude!
6+ years of relevant experience, Bachelor or Master s degree in Computer Science or a related technical field.
Experience with administration and orchestration of cloud computing (AWS, GCP, etc) running virtual or container environments.
Good user and admin Linux skills (Ubuntu a plus).Experience with virtual networking.
Working experience with IaC tools like Terraform and Ansible. Knowledge of Python and shell scripting.
Experience with CI/CD development using platforms like - Jenkins, Harness, Artifactory.
Solid problem solving, troubleshooting, critical thinking, communication, and teamwork skills.
Passion for automation and monitoring instrumentation in the code.
Keyskills: Performance tuning Automation Linux Networking Shell scripting Windows microsoft Troubleshooting Technical support Python
At Palo Alto Networks® everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. We have the vision of a world where each day is safer and more secure than the one before. These are not easy goals to accomplish - but we are not her...