Principal Lead, Observability & AI Ops
Get Noticed
- Make sure FAB Bank actually reads your resume
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
Role specific responsibilities:
• Define and implement the enterprise observability blueprint.
• Define SLI/SLO frameworks in collaboration with engineering teams.
• AIOps Enablement
• Implement AI-driven event correlation and incident prioritisation.
• Lead reduction of P1/P2 incidents through improved detection and prevention.
• Design real-time operational dashboards for executive reporting.
• Ensure seamless integration with ITSM platforms (e.g., ServiceNow).
• Embed AI-driven insights into Major Incident Management processes.
General functional responsibilities:
• Lead and develop a high-performing Observability & AIOps team.
• Manage budgets, vendor contracts, and technology roadmaps.
• Collaborate with Enterprise Architecture, Cloud, Security, and Application Engineering functions.
• Ensure alignment with ITIL, SRE, and enterprise governance frameworks.
• Report on operational health metrics to senior leadership and risk committees.
• Champion a culture of automation, reliability engineering, and data-driven operations.
• Support audit, regulatory, and resilience testing requirements.
Core competencies required:
- Technical Leadership
- Deep expertise in observability frameworks (metrics, logs, traces, events, topology).
- Strong knowledge of OpenTelemetry standards and distributed tracing models.
- Experience with enterprise monitoring stacks (e.g., Splunk).Practical implementation of AIOps platforms
- AI & Data Analytics
- Event correlation, anomaly detection, noise reduction, and root cause analytics.
- Familiarity with machine learning models in operations
- Data engineering fundamentals for telemetry pipelines.
- Experience building automation workflows using orchestration tools and scripting.
- Governance & Architecture
- Tool rationalisation and vendor management experience.
Requirements
- •Deep expertise in observability frameworks (metrics, logs, traces, events, topology)
- •Strong knowledge of OpenTelemetry standards and distributed tracing models
- •Experience with enterprise monitoring stacks (e.g., Splunk)
- •Practical implementation of AIOps platforms
- •Familiarity with machine learning models in operations
- •Data engineering fundamentals for telemetry pipelines
- •Experience building automation workflows
- •Governance, Architecture, Tool rationalisation and vendor management
Nice to Have
- •Collaboration with Enterprise Architecture, Cloud, Security, and Application Engineering functions
- •Report on operational health metrics
- •Champion a culture of automation, reliability engineering, and data-driven operations
- •Support audit, regulatory, and resilience testing requirements
Responsibilities
- •Define and implement the enterprise observability blueprint
- •Implement AI-driven event correlation and incident prioritisation
- •Lead reduction of P1/P2 incidents
- •Design real-time operational dashboards
- •Ensure seamless integration with ITSM platforms
- •Embed AI-driven insights into Major Incident Management processes
- •Lead and develop a high-performing Observability AIOps team
- •Manage budgets, vendor contracts, and technology roadmaps
Related Jobs
Browse Similar
- Make sure FAB Bank actually reads your resume
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
FAB Bank (First Abu Dhabi Bank) is the UAE's largest bank and one of the world's largest and safest financial institutions. It offers a wide array of financial services.
Visit WebsiteView all jobs