Responsible for issuing an acknowledgment On Time and own the full responsibility to drive the key metrics MTTA (Mean Time To Acknowledge) & MTTE (Mean time to Engage)
Ability to correlate alerts and create a meaningful picture to conclude an impact
Ability to understand and correlate logs to problems
Aggressively chase the relevant On Call teams to Engage the final resolver for the Incident in the shortest possible time
Log all Incident/Service Request details, allocating categorization and prioritization codes
Record and classify received Incidents and undertake an immediate effort in order to restore a failed Service as quickly as possible
Keep users informed about their Incidents status at agreed intervals
Provide first-line investigation and diagnosis of all Incidents
Verify resolution with users and resolve Incidents in the ITSM tool
Escalate Major Incidents to the Incident Commander & others as per Escalation Matrix
Escalate Incidents at risk of breaching Service Level Agreement to the Incident Coordinator or others as require
Excellent communication & interpersonal skills
Qualification: B.Sc., B.Tech (other Graduation also works provided they have relevant experience)
0-6 month of experience in Monitoring distributed systems
Knowledge of Nmon, Nagios, Grafana, Solarwinds Orion, Centreon OR any such monirning tool is mandatory
Knowledge of basic ITIL concepts of Alerting and Incident Management
OK to working in a 24/7 production operations support environment and Incident Management
Need to work in Rotational shifts
Keyskills: nagios centreon matrix sql plsql orion linux software engineering mysql html communication skills jira python solarwinds vmware engineering nmon javascript angular node.js system servicenow grafana incident management mean splunk aws itil