AI Infrastructure Engineer
What's Your Score?
- See the score Eram Talent's ATS gives your resume
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
Eram Talent is looking for a talented AI Infrastructure Engineer to join our innovative team. The ideal candidate will be responsible for designing, building, and maintaining scalable and robust infrastructure solutions that support AI and machine learning workloads. This role involves working closely with data scientists, machine learning engineers, and software developers to optimize infrastructure performance and facilitate efficient AI model development and deployment.
Key Responsibilities:
• Design, implement, and manage high-performance computing environments tailored for AI and machine learning applications.
• Deploy and maintain GPU-accelerated clusters, cloud-based AI platforms, and parallel processing systems.
• Collaborate with data scientists and ML engineers to understand infrastructure requirements for various AI projects.
• Optimize resource allocation and scalability of AI infrastructure to support large datasets and complex models.
• Automate infrastructure provisioning and deployment using Infrastructure as Code (IaC) tools.
• Ensure security, compliance, and reliability of AI infrastructure.
• Monitor system performance and troubleshoot issues to minimize downtime and maximize productivity.
• Stay updated on emerging technologies and best practices in AI infrastructure and propose continuous improvements.Requirements
• Bachelor’s or higher degree in Computer Science, Engineering, or related technical field.
• 6+ years of experience in infrastructure engineering, preferably with a focus on AI, machine learning, or high-performance computing environments.
• Cloud skills - GCP/OpenShift, Kubernetes (k8s), Docker containers/images
• AI skills – Model training, testing/evaluation, deployment
• ML/LLMOPs
• LLMs and GenAI core skills – how do LLMs work under the hood, inference mechanics of LLMs/GenAI
• Inference scaling, distributed computing, inference benchmarking, inference planning for meeting SLAs/SLOs
• GPUs and how to work with them, distributed workloads handling, autoscaling
• NVIDIA NIMs, Huggingface
• NVIDIA Superpods (HPC, slurm, k8s)
• Monitoring, dashboards for LLM/ML workloads and applications
• AI Application Architecture know-how, end to end flows
• DevOps (CI/CD, argoCD, git, Jenkins etc)
• Languages: Python, SQL
Requirements
- •Bachelor’s or higher degree in Computer Science, Engineering, or related technical field
- •6+ years of experience in infrastructure engineering, preferably with a focus on AI, ML, or HPC
- •Cloud skills - GCP/OpenShift, Kubernetes (k8s), Docker
- •AI skills – Model training, testing/evaluation, deployment
- •ML/LLMOps
- •LLMs and GenAI core skills
- •GPUs and distributed workloads handling
- •DevOps (CI/CD, argoCD, git, Jenkins etc)
Nice to Have
- •Inference scaling, distributed computing, inference benchmarking
- •NVIDIA NIMs, Huggingface
- •NVIDIA Superpods (HPC, slurm, k8s)
- •Monitoring, dashboards for LLM/ML workloads
- •AI Application Architecture
- •Python, SQL
Responsibilities
- •Design, implement, and manage high-performance computing environments for AI/ML
- •Deploy and maintain GPU-accelerated clusters, cloud-based AI platforms
- •Collaborate with data scientists and ML engineers on infrastructure needs
- •Optimize resource allocation and scalability for AI infrastructure
- •Automate infrastructure provisioning using IaC tools
- •Ensure security, compliance, and reliability of AI infrastructure
- •Monitor system performance and troubleshoot issues
- •Stay updated on emerging AI infrastructure technologies
Related Jobs
Browse Similar
- See the score Eram Talent's ATS gives your resume
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
Eram Talent provides recruitment and talent solutions across various industries. They connect employers with skilled professionals to meet their workforce needs.
Visit WebsiteView all jobs