Flexera is looking for an experienced Site Reliability Engineer to join our SRE team
Were a fast-growing, category-leading organization with ambitious objectives and a positive, inclusive culture
Were looking for passionate professionals who want to grow their talents and achieve great things
If that sounds like you, we want to talk to you about joining our team
As a Site Reliability Engineer, you will be tasked with everything from helping with product design, to diagnosing issues, and writing automated scripts for mediating issues that occur in our production systems
You will be driven to build fault tolerant, scalable systems and automate away as much operational toil as you can
You align with the goals of the DevOps movement in improving collaboration between the development and operations disciplines
We are seeking someone with expensive experience working on a SaaS/Cloud product with a microservices architecture
Responsibilities:
Help to eliminate operational toil - seek to automate repetitive operations work.
Establishing and enhancing CI/CD pipelines
Create dashboards with Grafana/Prometheus which help communicate the metrics for a given product service.
Collaboration with other teams
Investigate, debug and provide resolution for customer issues.
Mentoring of team-members on cloud computing, infrastructure, and best practices
Ensuring the security and reliability of shared Infrastructure with the Flexera cloud
Making Reliability a first-class citizen
Design, develop and deploy new features for Flexera products/platforms, as defined by goals from the SRE organization.
Create dashboards which help communicate the metrics for a given product service
Work with product owners and product engineering teams to perform capacity planning.
Work with product engineering teams to understand performance and behavior patterns.
Be part of an on-call rotation for alerts that require engineering expertise to diagnose.
Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesnt happen in the same way again
Minimum Qualifications
Computer Science degree, or related industry experience managing a mission critical production system in AWS (or equivalent Azure/Google cloud) for at least 4 years.
Critical Skills / Competencies
Required:
Agile software delivery methodologies
Experience managing cloud-based services like AWS or Azure at scale
Experience with DevOps
Infrastructure provisioning experience
Experience deploying to and orchestrating containers (Docker, Kubernetes, etc.)
Expertise in Linux and good understanding of its commands
Good networking fundamentals
GitHub for collaboration and change management.
Experience with AWS services such as EC2, ECS, EKS, S3
Database exposure preferably MySQL, Amazon RDS and MongoDB
Good to have:
Understanding of RESTful APIs and other web-based application concepts
Any scripting language experience (Ruby is the current language, but comparable experience in Java, Python, Perl, etc. would suffice)
Knowledge on Go Lang.
Knowledge on Helm
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time