AI Agent Testing Specialist

New

Skills

Computer science Data Analytics Data Annotation Data Science Machine Learning Natural Language Processing QA Software Engineering

At Mindrift, we are seeking an AI Agent Testing Specialist to design realistic evaluation scenarios for LLM-based agents. You will create test cases, define gold-standard behavior, and work with developers to ensure clarity and accuracy in agent actions.

Key Responsibilities
  • Design structured test scenarios based on real-world tasks
  • Define gold path and acceptable agent behavior
  • Annotate task steps, expected outputs, and edge cases
  • Work with devs to test scenarios and improve clarity
  • Review agent outputs and adapt tests accordingly
Required Skills & Qualifications
  • Bachelor's and/or Master's Degree in relevant fields
  • Background in QA, software testing, data analysis, or NLP annotation
  • Good understanding of test design principles
  • Strong written communication skills in English
  • Comfortable with structured formats like JSON/YAML
  • Basic experience with Python and JS
  • Curious and open to working with AI-generated content
  • Ready to learn new methods and work remotely

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: 12 Months

Share this job:

Similar Jobs

Senior Full Stack Engineer

Posted 57 days ago

Develop and maintain full stack web and mobile applications.

Implement and automate robust backend API services.

Angular API Design AWS CI/CD

Software Engineer at Acquia

Posted 57 days ago

Develop and maintain customer-facing applications and microservices on Kubernetes

Collaborate with cross-functional teams to document product and development details

Ansible AWS Devops Engineer

Remote AI Analyst

Posted 57 days ago

Drive customer experience automation

Optimize business processes with AI solutions

Ai Business Analyst Data Analysis Data Visualization

Senior Marketing Analyst Role

Posted 57 days ago

Offer a remote full-time analyst position

Drive business growth through marketing analytics

Cross-functional Collaboration Customer Experience Data Analytics Digital Marketing

Principal AI Engineer Role

Posted 57 days ago

Hire a remote Principal AI Engineer

Develop customer experience automation solutions

Ai Automation AWS Cloud Computing

ML Engineer - AdTech

Posted 57 days ago

Design and implement ML systems|Apply optimization strategies|Collaborate with teams|Analyze data

r user behavior|Develop data

C++ Data Analysis Java Machine Learning

Senior Software Engineer - BizTech

Posted 57 days ago

Solve challenging problems for Airbnb and users

Remove friction from user journey

Android C++ Engineer Java

Senior Staff Software Engineer - Payments

Posted 57 days ago

Define and drive technical strategy and architecture

Collaborate with cross-functional teams for reliable solutions

Architecture Code reviews Communication Communication Skills

Staff Software Engineer, Tax

Posted 57 days ago

Lead and scale tax engineering systems at Airbnb

Collaborate cross-functionally on global platform initiatives

Apis Architecture Backend Development Cloud Platforms

Staff Software Engineer - Biztech

Posted 57 days ago

Solving challenging and unique problems in Global Tax Engineering at Airbnb

Promoting sustainable engineering practices and well-being in the work environment

Architecture Backend Development Engineer Fintech

Business Development Rep

Posted 57 days ago

Conduct regular outbound activities to key web3 prospects

Understand prospective customers' goals and assess how Allium.so can help

Blockchain Business Development Collaboration Tools Communication Skills

Blockchain Solutions Engineer

Posted 57 days ago

Engage with customers to understand blockchain data needs

Design and implement tailored data analytics solutions

Blockchain technology Data Analytics Data Security Documentation

Revenue Operations Lead

Posted 57 days ago

Optimize revenue operations for Allium.so's growth

Develop accurate revenue forecasting models

Bi tools Collaboration Tools Crm systems Data Analytics

Forward Deployed Software Engineer (Data)

Posted 57 days ago

Leading the technical implementation of AI-driven data solutions

Translating customer needs into technical requirements

AWS Communication Skills Engineer Microsoft Azure

Forward Deployed Software Engineer

Posted 57 days ago

Lead technical implementation and optimization of data platform

Serve as primary technical contact for key accounts

Airflow AWS Databricks Engineer

Staff AI Backend Engineer

Posted 57 days ago

Architect and maintain high-performance backend infrastructure

Integrate advanced AI and LLM technologies into data workflows

AWS Docker Engineer FastAPI

Staff Software Engineer Role

Posted 57 days ago

Lead backend development for AI-powered data solutions

Architect scalable systems and APIs for enterprise clients

Agile Methodologies Angular AWS CI/CD

Staff Software Engineer, AI Backend

Posted 57 days ago

Design and build scalable AI-driven backend systems

Integrate advanced language models into data workflows

AWS Docker FastAPI Google Cloud Platform

Staff Software Engineer

Posted 57 days ago

Revolutionize enterprise data operations through AI solutions.

Automate and accelerate data tasks for overworked data teams.

Ai Airflow Ansible Api Development

AI-Powered Data Operations Revolution

Posted 57 days ago

Revolutionize enterprise data operations through AI automation.

Develop high-performance backend systems for AI solutions.

Docker Engineer FastAPI Python

Anaplan Full Stack Engineer

Posted 57 days ago

Build exceptional software for a global growth strategy

Deliver seamless user experiences through architecture and collaboration

Android Css Docker Engineer

Anaplan Solution Architect

Posted 57 days ago

Architect, design, and deliver Anaplan planning solutions

Translate client requirements into sophisticated models

Agile Agile Methodology Anaplan Business Analytics

AI Trainer for Gaming

Posted 57 days ago

Improve large language model accuracy

Support immersive game content creation

Collaboration Communication Data Analysis Data Annotation

AI Trainer for Games

Posted 57 days ago

Enhance AI model accuracy and performance

Annotate and evaluate in-game content

Content Development Data Annotation English Fluency Remote Collaboration

AI Trainer Role

Posted 57 days ago

Enhance AI dialogue quality

Provide high-quality training data

Ai Ai training Collaboration Communication

Senior Data Scientist Role

Posted 57 days ago

Hire a remote Senior Data Scientist

Enhance product with data-driven insights

A/b Testing Big Data Communication Cross-functional Communication

Marketing Operations Senior Manager

Posted 57 days ago

Hire a senior marketing operations manager

Remote and flexible work opportunity

Campaign Management Compliance Management Crm systems Cross-functional Collaboration

Remote Quality Engineer

Posted 57 days ago

Hiring a remote Quality Engineer for Apollo

Full-time position in Poland

Collaboration Engineer Problem-solving QA

Senior Backend Engineer

Posted 57 days ago

Develop scalable backend solutions

Mentor team members

Angular Api Development Architecture AWS

Staff Backend Engineer

Posted 57 days ago

Lead technical direction and complex initiatives

Architect and build scalable systems

Ai Tools Architecture Computer science Distributed systems

Senior AI Engineer

Posted 57 days ago

Build and productionize advanced AI systems

Develop large language model platforms

Ab testing Ai Systems Api Integration Architecture

Staff Backend Engineer Role

Posted 57 days ago

Architect and scale backend platforms

Lead and mentor engineering teams

Ai Tools Api Integration Architecture Computer science

Senior AI Engineer Role

Posted 57 days ago

Build and deploy scalable AI systems for production use.

Develop advanced multi-agent architectures and conversational AI.

Api Integration Architecture AWS Backend Development

Senior Backend Engineer Role

Posted 57 days ago

Design scalable backend solutions

Lead full software development lifecycle

Agile Methodologies Android Android development Apache Kafka

Senior Technical Sourcing Recruiter

Posted 57 days ago

Source senior-level technical talent across India

Develop and implement innovative sourcing strategies

Communication Data Analysis Market Intelligence Pipeline Management

Senior Frontend Engineer Role

Posted 57 days ago

Drive high-quality frontend development

Collaborate cross-functionally with teams

Agile Methodologies Angular Css Cypress

Senior Engineering Sourcing Recruiter

Posted 57 days ago

Strategically source senior technical talent

Engage and convert passive candidates

Communication Data Analysis Market Intelligence Software Engineering

Staff ML Engineer, Apollo

Posted 57 days ago

Lead development of scalable ML systems

Advance Apollo's AI-native product features

Airflow Architecture Databricks Engineer

Senior ML Engineer, Remote

Posted 57 days ago

Design and productionize scalable machine learning systems

Personalize user experiences using data-driven models

Cloud Computer science Databricks Engineer

Quality Engineer Automation Specialist

Posted 57 days ago

Drive automation testing initiatives across engineering teams

Ensure software quality through best practices and robust methodologies

Automation Computer science Cypress Engineer

Senior ML Engineer II at Apollo

Posted 57 days ago

Build and productionize Machine Learning models for Apollo products

Optimize users' experience at all stages of their product journey

Airflow Ai Systems Cloud Computer science

Senior Frontend Engineer

Posted 57 days ago

Implement best practices

Drive quality and innovation

Android AWS CI/CD Cloud

Senior Product Manager

Posted 57 days ago

Build vision, strategy, and roadmap for new product line

Incorporate data analysis & research for product decisions

Ab testing Agile Agile Methodology Analytical Skills

ML Engineer on Apollo Team

Posted 57 days ago

Build and deploy ML models for Apollo products.

Enhance user experience through data-driven insights.

Airflow Cloud Computer science Databricks

Enterprise Account Executive

Posted 57 days ago

Manage key enterprise accounts effectively.

Drive revenue growth through strategic planning.

Account Executive Account manager Client Relationship Management Cloud Computing

AI Team Engineering Lead

Posted 57 days ago

. Lead and manage the AI Team effectively

. Drive innovation in AI technologies

Big Data Cloud Computing Data Science Java

QA Engineer FinTech Automation

Posted 57 days ago

Ensure high-quality product releases through manual and automated testing

Identify and address edge cases and usability issues early

Agile Methodology Appium Automation Testing Ci/cd Pipelines