Sr. IT Monitoring Engineer/Site Reliability Engineer @ Crowdstrike

Home > Devops

Sr. IT Monitoring Engineer/Site Reliability Engineer

Crowdstrike
5 - 10 years
Kolkata
7 days ago
Email to a friend
Report this job

Job Description

About the Role

The CrowdStrike Information Technology team is looking for a skilled Sr. IT Monitoring Engineer/Site Reliability Engineer (SRE) to join our IT Operations team. In this role, you will be responsible for designing, implementing, and maintaining monitoring solutions that ensure the reliability, availability, and performance of our critical IT infrastructure and applications. You will work at the intersection of operations and development, applying software engineering principles to operations tasks while focusing on system reliability and automation. This position requires a proactive approach to identifying and resolving issues before they impact business operations, as well as participating in on-call rotations to address incidents when they occur.

What You ll Need

5 + years of experience with enterprise monitoring tools (Prometheus, LogicMonitor, Datadog, ThousandEyes, Zscaler Digital Experience (ZDX))
Strong proficiency in scripting languages (Python, Bash, PowerShell) for automation
Experience with log management platforms (ELK stack, Splunk, LogScale)
Working knowledge of cloud services monitoring (AWS CloudWatch, GCP)
Experience with application performance monitoring (APM), digital experience monitoring (DEM) and infrastructure monitoring
Knowledge of SRE principles, SLOs, error budgets, and incident management
Experience with automated alerting, remediation workflows, and CI/CD pipeline monitoring
Familiarity with Infrastructure as Code (Terraform, Ansible) and containerization (Docker, Kubernetes)
Strong incident triage, root cause analysis, and documentation skills
Experience participating in on-call rotations and emergency response

What Youll Do

Monitoring and Reliability

Design and maintain comprehensive monitoring solutions across infrastructure and applications
Configure appropriate alerting thresholds to ensure timely response to potential issues
Define and track SLOs and error budgets for critical services
Create and maintain dashboards providing real-time visibility into system health
Conduct regular reviews of system reliability and recommend improvements

Incident Management and Operations

Participate in on-call rotation to respond to alerts and incidents
Lead incident response efforts and conduct thorough post-incident reviews
Document incidents, resolutions, and lessons learned
Develop and refine incident response procedures to improve MTTR
Implement proactive monitoring to detect potential issues before they impact users

Automation and Collaboration

Develop scripts and automation to streamline monitoring tasks and reduce manual effort
Create self-healing systems that can automatically remediate common issues
Integrate monitoring tools with other operational systems
Work closely with development, infrastructure, and security teams
Provide guidance on monitoring best practices and observability
Maintain comprehensive documentation for monitoring systems and procedures

Continuous Improvement

Stay current with industry trends in monitoring and site reliability engineering
Analyze monitoring data to identify patterns and improvement opportunities
Implement metrics to track the effectiveness of monitoring processes
Contribute to the evolution of the organizations monitoring strategy

Preferred Qualifications

SRE, cloud platform, or monitoring tool certifications
ITIL Foundation certification
Bachelors degree in Computer Science, Information Technology, or related field

Shift timings - 12PM -9PM IST

#LI-DP1

#LI-VJ1

#LI-Remote

Benefits of Working at CrowdStrike:

Remote-friendly and flexible work culture
Market leader in compensation and equity awards
Comprehensive physical and mental wellness programs
Competitive vacation and holidays for recharge
Paid parental and adoption leaves
Professional development opportunities for all employees regardless of level or role
Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
Vibrant office culture with world class amenities
Great Place to Work Certified across the globe

Job Classification

Industry: Hardware & Networking
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Crowdstrike
Location(s): Kolkata

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: Computer science Automation Powershell Incident management IT operations Information technology Monitoring Python Recruitment Business operations

Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

Devops Engineer

Cognizant

7 - 9 years

Hyderabad

4 days ago

₹ Not Disclosed

Azure Devops Engineer

Cognizant

9 - 14 years

Pune

4 days ago

₹ Not Disclosed

Engineer / Sr. Engineer - Dev Ops

World Fashion Exchange

2 - 4 years

Noida, Gurugram

6 days ago

₹ Not Disclosed

Senior Devops Engineer

Indium Software

8 - 13 years

Chennai

7 days ago

₹ Not Disclosed

Crowdstrike

CrowdStrike_x001A_ is the leader in next-generation endpoint protection, threat intelligence and response services. CrowdStrike_x001A_s core technology, the Falcon platform, stops breaches by preventing and responding to all types of attacks _x001A_ both malware and malware-free.

Sr. IT Monitoring Engineer/Site Reliability Engineer @ Crowdstrike

Home > Devops