Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

🇸🇦 Riyadh, Saudi Arabia🏢 On-site

AI InfrastructureGPUKubernetesSlurmNvidia AI EnterpriseLinuxPythonAutomation

WhatsApp LinkedIn X

Quick CV Check

Get your ATS score for DeepSource Technologies in 30 seconds
Get AI-rewritten bullet points
Download Gulf-ready CV

Get My Score

60 seconds. $3.99 one-time.

DeepSource Technologies

50-250 employees

Role Overview

We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.

Key Responsibilities

• Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
• Manage and operate GPU orchestration tools and platforms such as:
• Nvidia Base Command Manager (critical)
• Nvidia AI Enterprise Suite
• Nvidia GPU and Network Operators
• Nvidia NIMs and Blueprints
•
• Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
• Slurm (critical)
• Vanilla Kubernetes
•
• Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
• Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
• Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
• Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
• Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies. Requirements

Required Skills & Experience

• Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
• Hands-on experience with:
• Nvidia Base Command Manager
• Nvidia AI Enterprise Suite
• Nvidia GPU/Network Operators, NIMs, Blueprints
•
• Strong experience with Slurm and/or Kubernetes orchestration.
• Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
• Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
• Excellent troubleshooting and performance-tuning skills.
• Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
• Strong understanding of networking, security, resource allocation, and cluster management best practices.

Preferred Qualifications

Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

Requirements

•Proven experience managing GPU-based AI/ML infrastructure and compute clusters
•Hands-on experience with Nvidia Base Command Manager, Nvidia AI Enterprise Suite, Nvidia GPU/Network Operators, NIMs, Blueprints
•Strong experience with Slurm and/or Kubernetes orchestration
•Solid Linux system administration skills (Ubuntu or similar)
•Strong scripting/automation ability (Bash, Python)
•Excellent troubleshooting and performance-tuning skills
•Experience collaborating with ML/data science teams
•Strong understanding of networking, security, resource allocation, and cluster management

Nice to Have

•Previous experience in HPC or AI-focused infrastructure
•Knowledge of containerization, container orchestration, and GPUs
•Experience with CI/CD, infrastructure-as-code
•Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments

Responsibilities

•Deploy, maintain, and optimize GPU-based compute clusters and infrastructure
•Manage and operate GPU orchestration tools
•Configure, deploy, and maintain compute workloads using scheduling and orchestration tools
•Install, configure, and maintain the underlying OS and supporting system software
•Monitor and troubleshoot infrastructure performance, availability, and reliability
•Define infrastructure requirements, resource allocation, and deployment workflows with data scientists and engineers
•Develop automation scripts, CI/CD pipelines, and best practices
•Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies

Related Jobs

Freelance Bot Developer (WhatsApp / Telegram / Discord)

Mindrift · 🇸🇦 Saudi Arabia

License Owner / Operator, Riyadh

Stranger Soccer · 🇸🇦 Riyadh

Senior Backend Engineer - Typescript

Salla · 🇸🇦 Makkah

People and Culture Coordinator (Tamheer)

Radisson Hotel Group · 🇸🇦 Riyadh

Back to all jobs

Quick CV Check

Get your ATS score for DeepSource Technologies in 30 seconds
Get AI-rewritten bullet points
Download Gulf-ready CV

Get My Score

60 seconds. $3.99 one-time.

GCC Info

Company

DeepSource Technologies

50-250 employees

DeepSource provides an AI-powered platform for automated code review, helping development teams improve code quality and reduce bugs. It serves software engineering teams of all sizes.

Visit Website View all jobs

WhatsApp LinkedIn X

DeepSource Technologies

Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

🇸🇦 Riyadh, Saudi Arabia🏢 On-site

AI InfrastructureGPUKubernetesSlurmNvidia AI EnterpriseLinuxPythonAutomation

WhatsApp LinkedIn X

Quick CV Check

Get your ATS score for DeepSource Technologies in 30 seconds
Get AI-rewritten bullet points
Download Gulf-ready CV

Get My Score

60 seconds. $3.99 one-time.

DeepSource Technologies

50-250 employees

Role Overview

Key Responsibilities

Required Skills & Experience

Preferred Qualifications

Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

Requirements

•Proven experience managing GPU-based AI/ML infrastructure and compute clusters
•Hands-on experience with Nvidia Base Command Manager, Nvidia AI Enterprise Suite, Nvidia GPU/Network Operators, NIMs, Blueprints
•Strong experience with Slurm and/or Kubernetes orchestration
•Solid Linux system administration skills (Ubuntu or similar)
•Strong scripting/automation ability (Bash, Python)
•Excellent troubleshooting and performance-tuning skills
•Experience collaborating with ML/data science teams
•Strong understanding of networking, security, resource allocation, and cluster management

Nice to Have

•Previous experience in HPC or AI-focused infrastructure
•Knowledge of containerization, container orchestration, and GPUs
•Experience with CI/CD, infrastructure-as-code
•Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments

Responsibilities

•Deploy, maintain, and optimize GPU-based compute clusters and infrastructure
•Manage and operate GPU orchestration tools
•Configure, deploy, and maintain compute workloads using scheduling and orchestration tools
•Install, configure, and maintain the underlying OS and supporting system software
•Monitor and troubleshoot infrastructure performance, availability, and reliability
•Define infrastructure requirements, resource allocation, and deployment workflows with data scientists and engineers
•Develop automation scripts, CI/CD pipelines, and best practices
•Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies

Related Jobs

Freelance Bot Developer (WhatsApp / Telegram / Discord)

Mindrift · 🇸🇦 Saudi Arabia

License Owner / Operator, Riyadh

Stranger Soccer · 🇸🇦 Riyadh

Senior Backend Engineer - Typescript

Salla · 🇸🇦 Makkah

People and Culture Coordinator (Tamheer)

Radisson Hotel Group · 🇸🇦 Riyadh

Back to all jobs

Quick CV Check

Get your ATS score for DeepSource Technologies in 30 seconds
Get AI-rewritten bullet points
Download Gulf-ready CV

Get My Score

60 seconds. $3.99 one-time.

GCC Info

Company

DeepSource Technologies

50-250 employees

DeepSource provides an AI-powered platform for automated code review, helping development teams improve code quality and reduce bugs. It serves software engineering teams of all sizes.

Visit Website View all jobs

WhatsApp LinkedIn X

Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

Requirements

Nice to Have

Responsibilities

Related Jobs

Browse Similar

Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

Requirements

Nice to Have

Responsibilities

Related Jobs

Browse Similar