Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Sr. IT Monitoring Engineer/Site Reliability Engineer @ Crowdstrike

Home > Devops

 Sr. IT Monitoring Engineer/Site Reliability Engineer

Job Description

About the Role
The CrowdStrike Information Technology team is looking for a skilled Sr. IT Monitoring Engineer/Site Reliability Engineer (SRE) to join our IT Operations team. In this role, you will be responsible for designing, implementing, and maintaining monitoring solutions that ensure the reliability, availability, and performance of our critical IT infrastructure and applications. You will work at the intersection of operations and development, applying software engineering principles to operations tasks while focusing on system reliability and automation. This position requires a proactive approach to identifying and resolving issues before they impact business operations, as well as participating in on-call rotations to address incidents when they occur.
What You ll Need
  • 5 + years of experience with enterprise monitoring tools (Prometheus, LogicMonitor, Datadog, ThousandEyes, Zscaler Digital Experience (ZDX))
  • Strong proficiency in scripting languages (Python, Bash, PowerShell) for automation
  • Experience with log management platforms (ELK stack, Splunk, LogScale)
  • Working knowledge of cloud services monitoring (AWS CloudWatch, GCP)
  • Experience with application performance monitoring (APM), digital experience monitoring (DEM) and infrastructure monitoring
  • Knowledge of SRE principles, SLOs, error budgets, and incident management
  • Experience with automated alerting, remediation workflows, and CI/CD pipeline monitoring
  • Familiarity with Infrastructure as Code (Terraform, Ansible) and containerization (Docker, Kubernetes)
  • Strong incident triage, root cause analysis, and documentation skills
  • Experience participating in on-call rotations and emergency response
What Youll Do
Monitoring and Reliability
  • Design and maintain comprehensive monitoring solutions across infrastructure and applications
  • Configure appropriate alerting thresholds to ensure timely response to potential issues
  • Define and track SLOs and error budgets for critical services
  • Create and maintain dashboards providing real-time visibility into system health
  • Conduct regular reviews of system reliability and recommend improvements
Incident Management and Operations
  • Participate in on-call rotation to respond to alerts and incidents
  • Lead incident response efforts and conduct thorough post-incident reviews
  • Document incidents, resolutions, and lessons learned
  • Develop and refine incident response procedures to improve MTTR
  • Implement proactive monitoring to detect potential issues before they impact users
Automation and Collaboration
  • Develop scripts and automation to streamline monitoring tasks and reduce manual effort
  • Create self-healing systems that can automatically remediate common issues
  • Integrate monitoring tools with other operational systems
  • Work closely with development, infrastructure, and security teams
  • Provide guidance on monitoring best practices and observability
  • Maintain comprehensive documentation for monitoring systems and procedures
Continuous Improvement
  • Stay current with industry trends in monitoring and site reliability engineering
  • Analyze monitoring data to identify patterns and improvement opportunities
  • Implement metrics to track the effectiveness of monitoring processes
  • Contribute to the evolution of the organizations monitoring strategy
Preferred Qualifications
  • SRE, cloud platform, or monitoring tool certifications
  • ITIL Foundation certification
  • Bachelors degree in Computer Science, Information Technology, or related field
Shift timings - 12PM -9PM IST
#LI-DP1
#LI-VJ1
#LI-Remote
Benefits of Working at CrowdStrike:
  • Remote-friendly and flexible work culture
  • Market leader in compensation and equity awards
  • Comprehensive physical and mental wellness programs
  • Competitive vacation and holidays for recharge
  • Paid parental and adoption leaves
  • Professional development opportunities for all employees regardless of level or role
  • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
  • Vibrant office culture with world class amenities
  • Great Place to Work Certified across the globe

Job Classification

Industry: Hardware & Networking
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Crowdstrike
Location(s): Kolkata

+ View Contactajax loader


Keyskills:   Computer science Automation Powershell Incident management IT operations Information technology Monitoring Python Recruitment Business operations

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Devops Engineer

  • Cognizant
  • 7 - 9 years
  • Hyderabad
  • 4 days ago
₹ Not Disclosed

Azure Devops Engineer

  • Cognizant
  • 9 - 14 years
  • Pune
  • 4 days ago
₹ Not Disclosed

Engineer / Sr. Engineer - Dev Ops

  • World Fashion Exchange
  • 2 - 4 years
  • Noida, Gurugram
  • 6 days ago
₹ Not Disclosed

Senior Devops Engineer

  • Indium Software
  • 8 - 13 years
  • Chennai
  • 7 days ago
₹ Not Disclosed

Crowdstrike

CrowdStrike_x001A_ is the leader in next-generation endpoint protection, threat intelligence and response services. CrowdStrike_x001A_s core technology, the Falcon platform, stops breaches by preventing and responding to all types of attacks _x001A_ both malware and malware-free.