menajobs
  • Resume Tools
  • ATS Checker
  • Offer Checker
  • Features
  • Pricing
  • FAQ
LoginGet Started — Free
Home/Jobs/Senior Site Reliability Engineer (SRE)
Salla logo
Salla

Senior Site Reliability Engineer (SRE)

🇸🇦 Makkah, Saudi Arabia🏢 On-site
Site Reliability EngineeringKubernetesAWSGCPAzureTerraformPrometheusGrafana
WhatsAppLinkedInX

Wait — Check First

  • Check if your CV is ATS-ready for Salla
  • Get AI-rewritten bullet points
  • Download Gulf-ready CV
Quick ATS Check

60 seconds. $3.99 one-time.

Salla logo
Salla
employees

As a Senior SRE at Salla, you will lead reliability initiatives, handle complex incidents, improve platform performance, and guide engineering teams toward building resilient systems. You will also participate in the on-call rotation as part of our commitment to platform reliability.

Reliability & Incident Management

• Lead high-severity incident response and drive post-incident reviews.
• Troubleshoot complex issues across applications, infrastructure, and networks.
• Improve MTTR through better monitoring, alerts, and diagnostic tooling.
• Participate in the on-call rotation supporting production systems.Performance & Scalability

• Identify and resolve performance bottlenecks and scaling challenges.
• Conduct load testing and capacity planning for high-traffic scenarios.Infrastructure & Operations

• Enhance cloud-native infrastructure, deployment processes, and automation.
• Improve resilience, fault-tolerance, and recovery mechanisms across systems.Observability

• Build and refine dashboards, alerts, metrics, logs, and traces.
• Define SLIs/SLOs and improve visibility into system behavior.Tooling & Automation

• Develop tools that reduce operational toil and increase reliability.
• Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows.Collaboration

• Work closely with engineering teams to ensure services are robust and production-ready.
• Mentor engineers on reliability, debugging, and operational best practices.Bonus Skills

• Background in large-scale, high-traffic systems.
• Experience with fault-tolerant design, DR, and HA patterns.
• Familiarity with SLOs, SLIs, and error budgets.Location Preference

• Candidates located within GMT 0 to +6 time zones are preferred to align with team collaboration and on-call coverage.Requirements

• Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS, GCP, or Azure).
• Deep understanding of Linux, networking, distributed systems, and load balancing.
• Hands-on experience with Terraform or similar Infrastructure-as-Code tools.
• Experience with observability platforms such as Prometheus, Grafana, Loki, Mimir, Elastic, or equivalent.
• Proficiency in scripting or programming languages such as Bash, Python, or Go.
• Experience with CI/CD pipelines and GitOps practices.
• Strong debugging, incident response, and performance analysis skills.

Requirements

  • •Strong experience with Kubernetes, service mesh technologies, and cloud platforms (AWS, GCP, or Azure).
  • •Deep understanding of Linux, networking, distributed systems, and load balancing.
  • •Hands-on experience with Terraform or similar Infrastructure-as-Code tools.
  • •Experience with observability platforms (Prometheus, Grafana, Loki, Mimir, Elastic, or equivalent).
  • •Proficiency in scripting or programming languages (Bash, Python, or Go).
  • •Experience with CI/CD pipelines and GitOps practices.
  • •Strong debugging, incident response, and performance analysis skills.

Nice to Have

  • •Background in large-scale, high-traffic systems.
  • •Experience with fault-tolerant design, DR, and HA patterns.
  • •Familiarity with SLOs, SLIs, and error budgets.

Responsibilities

  • •Lead high-severity incident response and drive post-incident reviews.
  • •Troubleshoot complex issues across applications, infrastructure, and networks.
  • •Improve MTTR through better monitoring, alerts, and diagnostic tooling.
  • •Participate in the on-call rotation supporting production systems.
  • •Identify and resolve performance bottlenecks and scaling challenges.
  • •Conduct load testing and capacity planning.
  • •Enhance cloud-native infrastructure, deployment processes, and automation.
  • •Improve resilience, fault-tolerance, and recovery mechanisms.

Related Jobs

Salla logo
Senior Data Analyst
Salla · 🇸🇦 Jeddah
Salla logo
Data Scientist
Salla · 🇸🇦 Jeddah
Salla logo
Data Analyst
Salla · 🇸🇦 Jeddah
AECOM logo
Fire, Life, and Safety Specialist
AECOM · 🇸🇦 Makkah
Back to all jobs
The ATS View
  • See what Salla's hiring system sees in your CV
  • Get AI-rewritten bullet points
  • Download Gulf-ready CV
Show Me

60 seconds. $3.99 one-time.

GCC Info
Company
Salla logo
Salla
employees

Visit WebsiteView all jobs
Share
WhatsAppLinkedInX
menajobs

AI-powered resume optimization for the Gulf job market.

Serving:

UAESaudi ArabiaQatarKuwaitBahrainOman

Product

  • Resume Tools
  • Features
  • Pricing
  • FAQ

Resources

  • Resume Examples
  • CV Format Guides
  • Skills Guides
  • Salary Guides
  • ATS Keywords
  • Job Descriptions
  • Career Paths
  • Interview Questions
  • Achievement Examples
  • Resume Mistakes
  • Cover Letters
  • Resume Summaries

Country Guides

  • Jobs by Country
  • Visa Guides
  • Cost of Living
  • Expat Guides
  • Work Culture

Free Tools

  • ATS Checker
  • Offer Evaluator
  • Salary Guides
  • All Tools

Company

  • About
  • Contact Us
  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Shipping & Delivery
  • Sitemap

Browse by Location

  • Jobs in UAE
  • Jobs in Saudi Arabia
  • Jobs in Qatar
  • Jobs in Dubai
  • Jobs in Riyadh
  • Jobs in Abu Dhabi

Browse by Category

  • Technology Jobs
  • Healthcare Jobs
  • Finance Jobs
  • Construction Jobs
  • Oil & Gas Jobs
  • Marketing Jobs

Popular Searches

  • Tech Jobs in Dubai
  • Healthcare in Saudi Arabia
  • Engineering in UAE
  • Finance in Qatar
  • IT Jobs in Riyadh
  • Oil & Gas in Abu Dhabi

© 2026 MenaJobs. All rights reserved.