Quality Assurance Engineer
Wait — Check First
- Check if your CV is ATS-ready for AI71
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
Role Summary
AI71 is seeking a Senior QA Automation Engineer to lead the validation and verification strategies for EDGE Group’s AI transformation. You will be responsible for defining "what good looks like" for non-deterministic AI systems, ensuring that Large Language Models (LLMs) and predictive engines meet the strict reliability standards of the defense sector.
You will act as the bridge between Agile development and formal Systems Engineering. Your mandate is to build automated testing frameworks that not only verify software functionality but also validate AI behaviors against "Ground Truth" datasets. Working within a structured "Stage Gate" delivery model, you will ensure our AI agents pass the rigorous Test Readiness Reviews (TRR) and Functional Configuration Audits (FCA) required for deployment.
Key Responsibilities
• AI & LLM Validation (LeverEDGE)
• Non-Deterministic Testing: Architect automated frameworks to evaluate Generative AI outputs (e.g., drafted CONOPS, technical requirements) for hallucination, consistency, and factual accuracy against "Gold Standard" datasets.
• RAG Evaluation: Implement automated metrics (e.g., RAGAS, faithfulness, answer relevance, etc.) to verify that the Retrieval-Augmented Generation pipeline is accurately citing internal technical documentation and regulatory texts.
• Prompt Regression: Design regression suites that monitor "prompt drift," ensuring that changes to the underlying model or system instructions do not degrade the quality of AI-generated engineering documents.
• Integration & System Verification (Supply Chain)
• ERP Integration Testing: Build robust integration tests to validate data consistency between AI agents and critical enterprise systems (e.g., SAP S/4HANA, Ariba, etc.), ensuring no corruption of Bill of Materials (BOM) or financial data.
• Performance Benchmarking: Design performance tests to validate the latency and throughput of forecasting models and risk scoring engines, ensuring they meet the real-time requirements of supply chain dashboards.
• API Validation: Automate the testing of secure API gateways, verifying that Role-Based Access Control (RBAC) and PII redaction logic are functioning correctly before data reaches the AI models.
• Governance & Traceability
• V-Model Alignment: Map automated test cases directly to "System Requirements" and "User Needs," creating the digital evidence required for formal Verification and Validation (V&V) reports described in the Systems Engineering Handbook.
• Stage Gate Compliance: Prepare "Test Readiness" packages for formal Stage Gate reviews, providing quantitative evidence that the system is stable enough to move from MVP to Production.
• Defect Lifecycle Management: Manage the feedback loop between the "Requirements Quality Assistant" and the development teams, ensuring that defects found in AI logic are traced back to specific model versions or data sets.
Technical Requirements
• Core Automation: Expert proficiency in Python for building custom test harnesses (Pytest) and standard automation libraries (Selenium/Playwright for UI, Requests for API).
• Core Performance Testing: Expert proficiency in crafting Performance Test Plans + Implementations (e.g., Locust, Jmeter, K6, etc.)
• AI Evaluation: Experience utilizing frameworks for evaluating LLMs (e.g., DeepEval, TruLens, or custom Python evaluators). Understanding of "Ground Truth" dataset creation and management.
• Data Validation: Proficiency with SQL and data validation tools (e.g., Great Expectations) to verify data quality within Data Lakehouses and Vector Databases.
• CI/CD Integration: Strong experience integrating automated tests into GitLab CI/CD pipelines, enforcing "Quality Gates" that prevent non-compliant code or models from merging.
• Traceability Tools: Familiarity with requirements management tools (e.g., Jira, Linear, Jama, Polarion, etc.) and how to link automated test results to specific requirement IDs.
• Test Management/Reporting Tools: Strong hands-on on Managing Test Reports + Artifacts (e.g., TestRail, Allure, etc.)
• Version Control: String knowledge maintaining code based frameworks (e.g., Git, Gitlab, etc.)
• Quality Engineering practices: Strong hands-on knowledge on modern Quality Engineering best practices designated for fast-paced development environments (e.g., Shift Left Approaches, Test Pyramid, Mono-repo architecture for automation projects)
Professional Qualifications
• Experience: 5+ years of experience in QA Automation, with at least 2 years focused on testing complex data-driven applications, ML models, or AI agents.
• Domain Knowledge: Experience in Defense, Aerospace, or highly regulated industries is a strong plus. Understanding of IV&V (Integration, Verification, and Validation) processes is highly desirable.
• Analytical Mindset: Ability to define pass/fail criteria for probabilistic systems (where the answer isn't always 100% same) and communicate "Confidence Levels" to engineering leadership.
• Collaboration: Proven ability to work with Data Scientists to understand model limitations and with Systems Engineers to understand formal acceptance criteria.
Why This Role?
You are the final line of defense. In this role, you define whether an AI agent is "trusted" to negotiate a contract or design a critical system component. You will pioneer new methodologies for testing Generative AI within a safety-critical environment, setting the standard for how defense organizations validate intelligent systems.
Requirements
- •Architect automated frameworks to evaluate Generative AI outputs
- •Implement automated metrics to verify RAG pipeline accuracy
- •Design regression suites to monitor prompt drift
- •Build integration tests to validate data consistency with enterprise systems
- •Design performance tests for forecasting models and risk scoring engines
- •Automate the testing of secure API gateways
- •Map automated test cases to System Requirements and User Needs
- •Ensure AI agents pass Test Readiness Reviews (TRR) and Functional Configuration Audits (FCA)
Related Jobs
- Check if your CV is ATS-ready for AI71
- Get AI-rewritten bullet points
- Download Gulf-ready CV
60 seconds. $3.99 one-time.
AI71 offers a platform for creating and deploying advanced AI models. It serves businesses and developers seeking to integrate sophisticated artificial intelligence into their products.
Visit WebsiteView all jobs