
Freelance Agent Evaluation Engineer
Are You in the 25%?
- Check if Mindrift will actually see your resume
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.

Please submit your CV in English and indicate your level of English proficiency.
Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment.
What this opportunity involves
You’ll create challenging coding test cases that push AI coding systems to their limits:
• Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources
• Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks
• Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required)
• Analyze AI failures to understand what the model struggles with vs. what it masters
• Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteriaWhat we look for
This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-permanent projects. Ideally, contributors will have:
• Degree in Computer Science, Software Engineering or related fields
• 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
• Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems
• Experience writing tests (functional, integration – not just running them)
• Docker containers (running evaluations locally in containers)
• CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
• English proficiency - B2How it works
Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid
Effort estimate
Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.
Payment
• Paid contributions, with rates up to $40/hour*
• Fixed project rate or individual rates, depending on the project
• Some projects include incentive payments *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.
Requirements
- •Degree in Computer Science, Software Engineering or related fields
- •5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
- •Background in Full-Stack development (React-based interfaces and robust Back-end systems)
- •Experience writing tests (functional, integration)
- •Docker containers experience
- •CI/CD understanding (GitHub Actions)
- •English proficiency - B2
Responsibilities
- •Create challenging coding test cases for AI systems
- •Review and refine realistic coding tasks
- •Write comprehensive functional tests
- •Craft challenging scenarios for AI
- •Analyze AI failures
- •Iterate based on feedback
Related Jobs
- Check if your CV is ATS-ready for Mindrift
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.



