menajobs
  • Resume Tools
  • ATS Checker
  • Offer Checker
  • Features
  • Pricing
  • FAQ
LoginGet Started — Free
Home/Jobs/Site Reliability Engineer
Lucidya logo
Lucidya

Site Reliability Engineer

🇸🇦 Riyadh, Saudi Arabia🏢 On-site
SRESite Reliability EngineeringAWSGCPAzureTerraformKubernetesPrometheus
WhatsAppLinkedInX

Quick CV Check

  • Get your ATS score for Lucidya in 30 seconds
  • Get AI-rewritten bullet points
  • Download Gulf-ready CV
Get My Score

60 seconds. $3.99 one-time.

Lucidya logo
Lucidya
employees

About Lucidya

Lucidya is an AI-native platform for customer experience (CX) intelligence that manages entire customer lifecycles autonomously, from initial engagement through retention and growth.

Unlike platforms that only surface insights and leave the action to you, Lucidya closes the loop with proprietary NLU technology built in-house and trained on millions of multilingual conversations. This enables marketing, support, CX, and research teams to deliver personalized experiences that drive measurable improvements in customer satisfaction, retention, and lifetime value.

As we continue scaling globally, the reliability, performance, and resilience of our infrastructure become mission-critical to everything we do.

Why this role matters

At Lucidya, our platform processes massive volumes of real-time customer data. Any downtime, latency, or instability directly impacts our customers’ ability to make decisions and serve their own users.

This role exists to make sure that doesn’t happen.

As a Site Reliability Engineer, you’ll sit at the heart of our platform’s stability, owning the reliability of our cloud infrastructure and ensuring it scales seamlessly as we grow. You won’t just react to issues; you’ll anticipate them, design systems that prevent them, and build automation that removes them entirely.

If you enjoy solving complex infrastructure challenges, eliminating inefficiencies, and building systems that “just work” - this is where you’ll thrive.

What You’ll Do

You’ll be responsible for outcomes, not just tasks. Here’s what success looks like in this role:

You’ll make reliability the default

• You’ll design and maintain infrastructure that is highly available, fault-tolerant, and scalable
• You’ll proactively identify and eliminate single points of failure before they become incidents
• You’ll ensure our production systems remain stable, even under increasing scale and loadYou’ll own and optimize our cloud environments

• You’ll manage and continuously improve workloads across AWS, GCP, or Azure
• You’ll use Infrastructure as Code (Terraform) to standardize and scale infrastructure
• You’ll optimize resource usage to balance performance and costYou’ll run and improve Kubernetes in production

• You’ll operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence
• You’ll troubleshoot issues quickly and ensure smooth deployments and upgrades
• You’ll ensure our containerized workloads perform reliably at scaleYou’ll build strong observability and respond to incidents

• You’ll implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK
• You’ll define alerting that is meaningful, not noisy
• You’ll respond to incidents, lead root cause analysis, and ensure we learn from every failureYou’ll automate everything that shouldn’t be manual

• You’ll write scripts and build tooling to eliminate repetitive operational work
• You’ll continuously improve infrastructure efficiency through automation
• You’ll promote a culture where manual work is a temporary state, not the normYou’ll collaborate to improve the entire system

• You’ll work closely with DevOps and engineering teams to solve performance bottlenecks
• You’ll contribute to CI/CD improvements and deployment reliability
• You’ll help shape reliability best practices across the organization

What success looks like (First 90 Days)

First 30 days:

• You’ve built a strong understanding of our infrastructure, systems, and workflows
• You’re contributing to day-to-day operations with support from the team
• You’ve started identifying areas for improvement in automation and reliabilityBy 90 days:

• You’re independently managing infrastructure tasks and troubleshooting issues
• You’re actively contributing to reliability and scalability improvements
• You’ve taken ownership of parts of our infrastructure and are improving themRequirements

Who You Are

This is what will make you successful in this role:

• You’ve spent ~3 years working in SRE, DevOps, or infrastructure engineering, and you’ve seen what breaks at scale
• You’re comfortable working in cloud environments like AWS, GCP, or Azure—and you understand how distributed systems behave
• You’ve worked hands-on with Kubernetes in production and know how to troubleshoot it when things go wrong
• You don’t just fix issues - you ask why they happened and make sure they don’t happen againTechnically, you likely:

• Use Terraform (or similar IaC tools) to manage infrastructure
• Work confidently with Docker and Kubernetes
• Write scripts in Python, Bash, or similar to automate workflows
• Understand CI/CD pipelines (Jenkins, GitHub Actions, Bitbucket, etc.)
• Have a solid grasp of networking, load balancing, and high-availability designWhen it comes to monitoring:

• You’ve implemented tools like Prometheus, Grafana, Datadog, or ELK
• You know the difference between useful alerts and noise
• You focus on signals that actually drive actionWhat sets you apart:

• You take ownership - you don’t wait to be told something is broken
• You’re calm under pressure and methodical during incidents
• You simplify complexity instead of adding to it
• You communicate clearly, even when explaining deeply technical issues
• You care about building systems that make other engineers more effectiveNice to Have (but not required)

• Experience with RabbitMQ or Redis in production
• Familiarity with Ansible or AWX
• Exposure to multi-cloud or hybrid environments
• Cloud certifications (AWS, GCP) or Linux certifications
• Background from ITI (Information Technology Institute)What the hiring process will look like

• Screening Interview – Talent Acquisition
• Technical Interview – SRE Lead
• Technical Task
• Final Interview – SRE Lead & Cloud DevOps Director

Requirements

  • •Experience managing and optimizing cloud environments (AWS, GCP, or Azure)
  • •Experience with Infrastructure as Code (Terraform)
  • •Experience running and scaling Kubernetes in production (EKS, GKE, etc.)
  • •Experience with monitoring systems (Prometheus, Grafana, Datadog, or ELK)
  • •Scripting and automation skills
  • •Incident response and root cause analysis experience
  • •Strong understanding of system design for high availability, fault tolerance, and scalability

Responsibilities

  • •Design and maintain infrastructure that is highly available, fault-tolerant, and scalable
  • •Proactively identify and eliminate single points of failure
  • •Ensure production systems remain stable under increasing scale and load
  • •Manage and continuously improve workloads across AWS, GCP, or Azure
  • •Use Infrastructure as Code (Terraform) to standardize and scale infrastructure
  • •Optimize resource usage to balance performance and cost
  • •Operate and scale Kubernetes clusters (EKS, GKE, etc.) with confidence
  • •Implement and refine monitoring systems using tools like Prometheus, Grafana, Datadog, or ELK

Related Jobs

Mindrift logo
Automotive Engineering & Python Expert - Freelance AI Trainer
Mindrift · 🇸🇦 Saudi Arabia
Mindrift logo
Automotive Engineer (Python) - Freelance AI Trainer
Mindrift · 🇸🇦 Saudi Arabia
Tamara logo
Team Lead - Partner Care (Voice)
Tamara · 🇸🇦 Riyadh
Mindrift logo
Machine Learning Developer (Freelance)
Mindrift · 🇸🇦 Saudi Arabia
Back to all jobs
Stop — Check First
  • Check your resume before Lucidya rejects it
  • Get AI-rewritten bullet points
  • Download Gulf-ready CV
Check Now

60 seconds. $3.99 one-time.

GCC Info
Company
Lucidya logo
Lucidya
employees

Visit WebsiteView all jobs
Share
WhatsAppLinkedInX
menajobs

AI-powered GCC job board with resume optimization tools.

Serving:

UAESaudi ArabiaQatarKuwaitBahrainOman

Product

  • Resume Tools
  • Features
  • Pricing
  • FAQ

Resources

  • Resume Examples
  • CV Format Guides
  • Skills Guides
  • Salary Guides
  • ATS Keywords
  • Job Descriptions
  • Career Paths
  • Interview Questions
  • Achievement Examples
  • Resume Mistakes
  • Cover Letters
  • Resume Summaries
  • Resume Templates
  • ATS Resume Guide
  • Fresher Resumes
  • Career Change
  • Industry Guides

Country Guides

  • Jobs by Country
  • Visa Guides
  • Cost of Living
  • Expat Guides
  • Work Culture

Free Tools

  • ATS Checker
  • Offer Evaluator
  • Salary Guides
  • All Tools

Company

  • About
  • Contact Us
  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Shipping & Delivery
  • Sitemap

Browse by Location

  • Jobs in UAE
  • Jobs in Saudi Arabia
  • Jobs in Qatar
  • Jobs in Dubai
  • Jobs in Riyadh
  • Jobs in Abu Dhabi

Browse by Category

  • Technology Jobs
  • Healthcare Jobs
  • Finance Jobs
  • Construction Jobs
  • Oil & Gas Jobs
  • Marketing Jobs

Popular Searches

  • Tech Jobs in Dubai
  • Healthcare in Saudi Arabia
  • Engineering in UAE
  • Finance in Qatar
  • IT Jobs in Riyadh
  • Oil & Gas in Abu Dhabi

© 2026 MenaJobs. All rights reserved.

LoginGet Started — Free