Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Principal Engineer - SaaS Operations Monitoring and Alerting @ Saviynt

Home > Devops

 Principal Engineer - SaaS Operations Monitoring and Alerting

Job Description

  • The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges.
  • Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience with building and managing Monitoring and Alerting systems.
  • We are looking for a Systems Thinking, Principal Engineer who has helped teams scale through production insights, operational automation, building observability program, developer guidance, real-time metrics, automation, automation, automation
WHAT YOU WILL BE DOING
  • Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
  • Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
  • Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
  • Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
  • Align the platform with customer needs and business goals by working closely with cross-functional teams.
  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Build software and systems to monitor platform infrastructure and applications.
  • Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions.
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
  • Provide primary operational support and engineering for multiple large-scale distributed software applications.
  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding.

WHAT YOU BRING

  • bachelors degree or higher in a technology related field (eg Engineering, Computer Science, etc) required, masters degree a plus
  • 6+ years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles.
  • 4+ experience in Cloud development (AWS, Azure) and observability skills; Experience with building and operating highly resilient platforms in AWS cloud environments.
  • 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
  • Hands-on experience with container orchestration, preferably with Kubernetes
  • Hands-on experience with building observability, monitoring and alerting on large scale distributed systems.
  • Leadership/design of application and/or infrastructure migration projects from on-prem to cloud
  • Cloud architecture design and implementation to solve key business needs and meet team goals.
  • Familiarity with current AWS solutions; Azure experience also considered.
  • Containerized workloads (Prefer Helm; Related: AKS & EKS, other K8s distributions, Docker, JFrog)
  • Logging and monitoring tools (Prefer: Prometheus, Grafana, Dataddon, AWS Cloudwatch; Related, , Azure Monitor, Log Analytics, Fluentd)
  • Network Security (eg AWZ Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints)
  • Proven experience in implementing advanced observability practices and techniques at scale.
  • Hands on experience with one or more observability tools (Prometheus, Grafana, ELK/OpenSearch, OpenTelemetry, Datadog, etc)
  • Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
  • Demonstrated ability to utilize modern monitoring tools (DataDog, Prometheus, etc)
  • Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
  • Ability to build monitoring ecosystem with high fidelity alerting.
  • Ability to automate resolution of alerts.
  • Ability to automate with various scripting languages (Python, Golang, Shell scripting,etc)
  • Knowledge of managing systems using infrastructure as code tools (IAM, ARM,Terraform, Chef)
  • Solid understanding of Cloud Computing and DevOps concepts.
  • Hands-on Kubernetes skills and knowledge.
  • Proven experience in maintaining scalability and resiliency of complex environment.
  • Ability to triage, execute root cause analysis, and be decisive under pressure
  • Experience managing and interpreting large datasets using query languages and visualization tools
  • Proficient communication skills with an ability to reach both technical and non-technical audience
  • Ability to learn new software, method and practices and bringing them to our developers
  • Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner and build and maintain effective relationships
The Value You Deliver
  • Help define and execute a comprehensive reliability and observability strategy, ensuring that Saviynt systems are always available when our customers need them.
  • You with Build an advanced observability practices and techniques at scale.
  • You will execute plans for technical standardization and process refinement within the engineering organization, especially for Site Reliability Engineers.
  • Troubleshoot stack-wide engineering issues related to hardware, software, network, applications, and cloud service providers

Job Classification

Industry: Software Product
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Saviynt
Location(s): Bengaluru

+ View Contactajax loader


Keyskills:   Performance tuning Cloud computing Automation VPN Shell scripting Active directory Network security SDLC Analytics Python

 Job seems aged, it may have been expired!
 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

DevOps Engineer

  • Accenture
  • 5 - 10 years
  • Hyderabad
  • 2 days ago
₹ Not Disclosed

Devops Engineer-tech Lead

  • Tech Mahindra
  • 10 - 15 years
  • Noida, Gurugram
  • 2 days ago
₹ Not Disclosed

Cloud Platform Engineer

  • Accenture
  • 12 - 15 years
  • Bengaluru
  • 2 days ago
₹ Not Disclosed

DevOps Engineer

  • Accenture
  • 5 - 10 years
  • Hyderabad
  • 2 days ago
₹ Not Disclosed

Saviynt

Saviynt is an identity authority platform built to power and protect the world at work. In a world of digital transformation, where organizations are faced with increasing cyber risk but cannot afford defensive measures to slow down progress, Saviynts Enterprise Identity Cloud gives customers unp...