Incident Response Analyst II
from 🇸🇬 Singapore
Key Responsibilities
1. Real-Time Infrastructure Monitoring
- Perform 24x7 monitoring of critical facility systems across global data centers, including:
- Electrical power systems
- Mechanical systems
- HVAC and cooling infrastructure
- Fire detection and suppression systems
- Water systems and supporting infrastructure
- Continuously monitor EPMS, BMS, DCIM, and centralized monitoring platforms.
- Detect abnormal operating conditions and alarms.
- Acknowledge and investigate alarms promptly.
- Track incidents and issues through to closure.
- Identify monitoring gaps and recommend improvements to monitoring coverage.
2. Incident Response and Coordination
- Provide first-level incident triage and technical assessment.
- Respond to facility alarms and operational events in real time.
- Execute escalation procedures according to defined protocols.
- Coordinate with internal teams, site personnel, vendors, and regional stakeholders to ensure timely issue resolution.
- Support major incident management activities for events such as:
- Utility power failures
- UPS and generator events
- Cooling/HVAC failures
- Fire alarm activations
- Water leakage events
- Security and environmental alerts
- Maintain end-to-end ownership of incidents until resolution.
3. Ticket Management and Change Coordination
- Create, update, and manage event tickets within established SLA targets.
- Process work orders and monitor completion quality.
- Track maintenance activities and change requests.
- Support change management processes and ensure operational compliance.
- Maintain accurate records of facility maintenance activities and change windows.
4. Compliance and Operational Governance
- Monitor and follow up on preventive maintenance activities and routine operational changes.
- Review technical documentation submitted by vendors and service providers, including:
- Method of Procedure (MOP)
- Risk Assessment (RA)
- Standard Operating Procedure (SOP)
- Ensure maintenance activities comply with operational standards and freeze-period requirements.
- Support risk management and operational audit activities.
5. Monitoring Platform and Data Administration
- Maintain monitoring platform master data and infrastructure records.
- Ensure the accuracy, completeness, and timeliness of asset and alarm information.
- Support platform optimization and continuous improvement initiatives.
- Maintain facility logs, event records, and operational documentation.
6. Reporting and Data Analysis
- Analyze facility operational data and identify trends or recurring issues.
- Prepare operational reports and performance summaries.
- Provide recommendations to improve reliability and operational efficiency.
- Maintain records required for audit, compliance, and management reporting.
7. Operational Support and Continuous Improvement
- Participate in after-hours support and emergency escalations.
- Provide remote support for overseas data center operations when required.
- Support centralized cross-regional operations and collaboration.
- Contribute to process improvements and monitoring platform enhancements.
- Perform other duties as assigned to support business continuity and operational excellence.
Minimum Qualifications
- Associate Degree, Diploma, or higher in Engineering, Information Technology, Facilities Management, or related disciplines.
- Minimum 2 years of experience in data center operations, facility monitoring, NOC, command center, or mission-critical environments.
- Working knowledge of:
- Electrical systems
- Mechanical systems
- HVAC and cooling infrastructure
- Fire detection and suppression systems
- Building Management Systems (BMS)
- Electrical Power Monitoring Systems (EPMS)
- DCIM or centralized monitoring platforms
- Experience working with incident management and escalation procedures.
- Strong communication and coordination skills.
- Ability to work in a 24x7 rotating shift environment.
- Ability to manage multiple priorities in high-pressure situations.
- Fluent in English.
- Chinese language proficiency (reading, writing, and verbal communication) is preferred to support Chinese alarm messages, documentation, and communications.
Preferred Qualifications
- Experience in:
- Network Operations Center (NOC)
- Facility Operations Center (FOC)
- Data Center Operations
- Critical Environment Operations
- Mission Critical Facilities
- Experience supporting global or cross-regional operations.
- Familiarity with structured incident, change, and problem management processes.
- Understanding of data center capacity management (space, power, cooling).
- Experience working with CMMS, DCIM, EPMS, BMS, or ticketing platforms.
- Ability to perform root cause analysis and drive issue resolution.
Desired Competencies
- Strong sense of ownership and urgency.
- Excellent communication and stakeholder management skills.
- Detail-oriented with strong documentation practices.
- Analytical and problem-solving mindset.
- Ability to learn quickly and adapt to changing operational environments.
- Team-oriented with a proactive and customer-focused attitude.
Preferred Certifications
Candidates with the following certifications will have an advantage:
- CDCP – Certified Data Centre Professional
- CDCS – Certified Data Centre Specialist
- FSM – Facilities Systems Management
- Uptime Institute ATD
- ITIL Foundation
- DCCA or DCT certifications
- Electrical or Mechanical engineering certifications
Shift Requirements
- Must be willing to work a 24x7 rotating shift schedule.
- Participate in weekends, public holidays, and on-call duty rotations when required.
- Support emergency response activities and major incidents.
Key Performance Indicators (KPIs)
The successful candidate is expected to consistently achieve:
- 100% shift attendance and handover compliance.
- 24x7 continuous monitoring coverage.
- Alarm acknowledgement within 1 minute.
- Immediate notification generation within 2 minutes.
- Event ticket creation within 10 minutes.
- Compliance with escalation and incident management SLAs.
- Zero service-impacting human errors.
- Accurate documentation and reporting.
- Continuous improvement contributions to operational processes and monitoring platforms.






