menajobs
  • Resume Tools
  • ATS Checker
  • Offer Checker
  • Features
  • Pricing
  • FAQ
LoginGet Started — Free
Home/Jobs/Freelance Agent Evaluation Engineer
Mindrift logo
Mindrift

Freelance Agent Evaluation Engineer

🇸🇦 Saudi Arabia, Saudi Arabia🏠 Remote
PythonAIMachine LearningSoftware DevelopmentTest AutomationReactCI/CDDocker
WhatsAppLinkedInX

Are You in the 25%?

  • Check if Mindrift will actually see your resume
  • Get AI-rewritten bullet points
  • Download Gulf-ready CV
Is Mine Getting Through?

60 seconds. $3.99 one-time.

Salary:$0-$40/mo
Mindrift logo
Mindrift
employees

Please submit your CV in English and indicate your level of English proficiency.

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment.

What this opportunity involves

You’ll create challenging coding test cases that push AI coding systems to their limits:

• Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources
• Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks
• Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required)
• Analyze AI failures to understand what the model struggles with vs. what it masters
• Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteriaWhat we look for

This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have:

• Degree in Computer Science, Software Engineering or related fields
• 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
• Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems
• Experience writing tests (functional, integration – not just running them)
• Docker containers (running evaluations locally in containers)
• CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
• English proficiency - B2How it works

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

Effort estimate

Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Payment

• Paid contributions, with rates up to $40/hour*
• Fixed project rate or individual rates, depending on the project
• Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Requirements

  • •Degree in Computer Science, Software Engineering or related fields
  • •5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
  • •Background in Full-Stack development (React-based interfaces and robust Back-end systems)
  • •Experience writing tests (functional, integration)
  • •Docker containers experience
  • •CI/CD understanding (GitHub Actions)
  • •English proficiency - B2

Responsibilities

  • •Create challenging coding test cases for AI systems
  • •Review and refine realistic coding tasks
  • •Write comprehensive functional tests
  • •Craft challenging scenarios for AI
  • •Analyze AI failures
  • •Iterate based on feedback

Related Jobs

Hadley Designs logo
Marketplace Category Manager
Hadley Designs · 🇸🇦 Saudi Arabia
AECOM logo
Engineer - Smart City
AECOM · 🇸🇦 Makkah
Foodics logo
Expansion Executive
Foodics · 🇸🇦 Jeddah
MLabs logo
Head of Ecosystem
MLabs · 🇦🇪 Dubai
Back to all jobs
Wait — Check First
  • Check if your CV is ATS-ready for Mindrift
  • Get AI-rewritten bullet points
  • Download Gulf-ready CV
Quick ATS Check

60 seconds. $3.99 one-time.

Salary
$0 - $40/mo
GCC Info
Company
Mindrift logo
Mindrift
employees

Visit WebsiteView all jobs
Share
WhatsAppLinkedInX
menajobs

AI-powered resume optimization for the Gulf job market.

Serving:

UAESaudi ArabiaQatarKuwaitBahrainOman

Product

  • Resume Tools
  • Features
  • Pricing
  • FAQ

Resources

  • Resume Examples
  • CV Format Guides
  • Skills Guides
  • Salary Guides
  • ATS Keywords
  • Job Descriptions
  • Career Paths
  • Interview Questions
  • Achievement Examples
  • Resume Mistakes
  • Cover Letters
  • Resume Summaries

Country Guides

  • Jobs by Country
  • Visa Guides
  • Cost of Living
  • Expat Guides
  • Work Culture

Free Tools

  • ATS Checker
  • Offer Evaluator
  • Salary Guides
  • All Tools

Company

  • About
  • Contact Us
  • Privacy Policy
  • Terms of Service
  • Refund Policy
  • Shipping & Delivery
  • Sitemap

Browse by Location

  • Jobs in UAE
  • Jobs in Saudi Arabia
  • Jobs in Qatar
  • Jobs in Dubai
  • Jobs in Riyadh
  • Jobs in Abu Dhabi

Browse by Category

  • Technology Jobs
  • Healthcare Jobs
  • Finance Jobs
  • Construction Jobs
  • Oil & Gas Jobs
  • Marketing Jobs

Popular Searches

  • Tech Jobs in Dubai
  • Healthcare in Saudi Arabia
  • Engineering in UAE
  • Finance in Qatar
  • IT Jobs in Riyadh
  • Oil & Gas in Abu Dhabi

© 2026 MenaJobs. All rights reserved.