Job Description
The Incident Management professional is responsible for managing the lifecycle of all incidents (unplanned interruptions or reductions in quality of IT services). The role ensures prompt resolution or escalation, minimizes impact on business operations, and maintains high levels of service availability.
________________________________________
Key Responsibilities:
- Major Incident skills and in managing P1/P2 incidents
- Incident Detection & Logging:
- Ensure all incidents are recorded accurately and comprehensively in the ITSM tool.
- Incident Classification & Prioritization:
- Assess and categorize incidents based on impact and urgency.
- Prioritize incidents to ensure timely resolution.
- Initial Diagnosis & Troubleshooting:
- Perform basic troubleshooting or analysis.
- Attempt first-time resolution where possible or escalate to appropriate resolver groups.
- Serve as the single point of contact for all major incidents.
- Communicate with stakeholders, including users and management, during ongoing incidents.
- Facilitate bridge calls or war rooms for major incidents (P1/P2).
- Monitor alerts and systems to detect and log incidents.
- Ensure timely escalation to the correct technical teams.
- Coordinate with cross-functional teams to expedite resolution.
Post-Incident Activities:
- Conduct post-incident reviews (PIRs) or root cause analysis (RCA).
- Identify areas of improvement and work with Problem Management for preventive actions.
- Documentation & Reporting:
- Maintain incident records and create reports for trend analysis.
- Ensure accurate and timely incident updates in service management tools.
In addition to incident triage and resolution, they will also:
- Review and assess Change Requests (CRs).
- Perform Tier 1 remediation tasks such as:
- Restarting servers
- Removing or adding servers to load balancers
- Conducting routine checks using monitoring tools.
This ensures critical support continuity during non-business hours and supports proactive remediation and change governance.
Activity
Details
Major Incident Handling
Triage Diagnosis Escalation Resolution
Change Request Reviews
Review CRs in ServiceNow, assess impact, liaise with stakeholders
Tier 1 Remediation Tasks
Server restarts, LB adjustments, basic monitoring checks
Stand-by & Overflow Coverage
Fill in for planned/unplanned absences or high-severity spikes
Shift Handover & Documentation
15 min overlap at shift-change; update runbooks & incident logs
Daily Standup
Quick sync, open issues, upcoming maintenance
Tools & Reporting
- ServiceNow for ticketing & change management
- PagerDuty (or equivalent) for alert routing
- Datadog/Splunk/Nagios/Prometheus for monitoring
- Confluence/Jira for runbooks, knowledge base
- Weekly & Monthly Reports: Incident trends & SLA adherence, Change volume & success rate, Resource utilization & overtime.
Required Skills:
- Strong understanding of ITIL (especially Incident Management process).
- Excellent communication and interpersonal skills.
- Analytical and problem-solving skills.
- Ability to work in flexible shifts including weekends.
Please share CV at an*******a@rs*****s.com.
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: IT & Information Security
Role Category: IT Support
Role: IT Support - Other
Employement Type: Full time
Contact Details:
Company: R Systems
Location(s): Noida, Gurugram
Keyskills:
Data Dog
Confluence
Pager Duty
Service now
Incident Management
ITIL Certified
Splunk.
JIRA
ITIL