Model Evaluation Engineer

New

Skills

Benchmarking Cloud Infrastructure Data Pipelines Documentation Experiment Design Machine Learning Python SQL Statistical Analysis Voice Agent Technologies

We are seeking a Research Engineer specializing in Evaluations to lead the end-to-end evaluation of models. This role requires a strong foundation in machine learning fundamentals and a knack for translating complex data into actionable insights.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets to stress-test specific capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Identify evaluation gaps proactively and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals: Ability to interpret results and debug issues without retraining from scratch.
  • Proficient in Python: Capable of writing clean evaluation scripts and working with data pipelines.
  • Comfortable with SQL and cloud infrastructure.
  • Good metric intuition: Understanding of effective evaluation metrics and ensuring statistical rigor.
  • Familiarity with voice agent stack, including VAD, ASR, turn detection, LLM, and TTS systems interaction.
  • Tinkerer mentality: Preference for shipping and iterating quickly.
  • Excellent communication skills: Ability to explain technical results and summarize findings effectively.
  • Ownership mindset: Proactively fill evaluation gaps.
  • Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.
  • Experience in maintaining evaluation datasets and documentation.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Software Engineer III, Google Cloud

Posted 74 days ago

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another.

Python

LLVM Compiler Developer

Posted 72 days ago

Develop and enhance LLVM and Clang based toolchain components

Collaborate with LLVM community for continuous integration

Back-end Bash C C++

Compiler Engineer (LLVM, C++)

Posted 72 days ago

Hiring experienced Compiler Engineers for LLVM and Clang toolchain development

Responsibilities include analyzing requirements, designing, and collaborating with the LLVM community

Back-end Bash Communication Llvm

LLVM Compiler Developer Role

Posted 72 days ago

Enhance and maintain LLVM/Clang-based toolchains

Support and optimize code for diverse platforms

Back-end Bash C++ Communication

Junior Mobile Security Tester

Posted 72 days ago

Conduct security testing on mobile and web applications

Identify and document vulnerabilities in digital banking platforms

Android Bash JavaScript Penetration Testing

Junior Web/Mobile Pen Tester

Posted 72 days ago

Ensure security of mobile and web banking applications

Identify and document vulnerabilities through penetration testing

Android JavaScript Penetration Testing Python

Senior Full Stack Engineer

Posted 72 days ago

Develop and maintain full stack web and mobile applications.

Implement and automate robust backend API services.

Angular API Design AWS CI/CD

C# Backend Engineer Remote

Posted 72 days ago

Develop and maintain backend C# services and APIs

Collaborate remotely with a distributed team

Agile Agile Methodologies Api Development Back-end

LLVM Compiler Engineer Role

Posted 72 days ago

Enhance and implement LLVM toolchain components

Extend open source compilers for new platforms

Back-end Bash C++ Communication

C# Development Engineer

Posted 72 days ago

Contribute to back-end services used by company products

Collaborate with a global team

Agile Agile Methodologies Ai Tools Api Development

LLVM Compiler Engineer

Posted 72 days ago

Seek experienced Compiler Engineers for LLVM and Clang toolchain

Collaborate with LLVM community and contribute to public repositories

Back-end Bash Communication Llvm

Back-End C# Developer

Posted 72 days ago

Enhance back-end services for millions of users

Collaborate with a global team on complex API implementations

Agile Agile Methodology Api Integration Back-end

LLVM Compiler Development

Posted 72 days ago

Enhance and implement components of the LLVM toolchain

Extend open source LLVM and Clang code bases

Back-end Bash Communication Llvm

LLVM Compiler Developer

Posted 72 days ago

- Enhance and implement toolchain components - Extend LLVM and Clang for client platforms -

ate with LLVM community - Analyze, build, and debug platform code - Develop and maintain target

Bash Python Version control

Senior Golang Developer Role

Posted 72 days ago

Develop cloud-based cyber protection solutions

Design and maintain high-load distributed services

Algorithms Architecture Cloud Services Data Structures

Senior Go Cloud Developer

Posted 72 days ago

Develop scalable cloud disaster recovery services

Design and implement high-load distributed systems

Algorithms Architecture Cloud Cloud Services

Remote AI Analyst

Posted 72 days ago

Drive customer experience automation

Optimize business processes with AI solutions

Ai Business Analyst Data Analysis Data Visualization

Pricing Strategy Analyst Role

Posted 72 days ago

Hire a remote pricing analyst

Optimize business pricing strategies

Data Analysis Data Visualization Excel Financial Modeling

Senior Marketing Analyst Role

Posted 72 days ago

Offer a remote full-time analyst position

Drive business growth through marketing analytics

Cross-functional Collaboration Customer Experience Data Analytics Digital Marketing

Principal AI Engineer Role

Posted 72 days ago

Hire a remote Principal AI Engineer

Develop customer experience automation solutions

Ai Automation AWS Cloud Computing

ML Engineer - AdTech

Posted 72 days ago

Design and implement ML systems|Apply optimization strategies|Collaborate with teams|Analyze data

r user behavior|Develop data

C++ Data Analysis Java Machine Learning

Affirm Senior Backend Engineer

Posted 72 days ago

Lead a team of engineers through ambiguity to solve problems

Collaborate with various stakeholders in product development

Android AWS Code Review Distributed systems

Software Engineer II, Backend

Posted 72 days ago

Develop backend systems for credit decisioning.

Collaborate with team members and stakeholders.

Android AWS Backend Systems Code Review

Software Engineer Role

Posted 72 days ago

Contribute to team projects and goals

Maintain system stability while achieving business goals

Code reviews Css Debugging Html

Senior Software Engineer - BizTech

Posted 72 days ago

Solve challenging problems for Airbnb and users

Remove friction from user journey

Android C++ Engineer Java

SOX Technical Control Testing

Posted 72 days ago

Evaluate and test IT controls for SOX compliance.

Collaborate with technology teams to assess and improve control design.

Accounting Change Management Communication Documentation

Remote Developer Support Engineer

Posted 72 days ago

Provide remote technical support

Assist with Airtable APIs and integrations

Android Apis Customer Service Customer Support

Remote Technical Support Specialist

Posted 72 days ago

Provide remote technical support

Assist users in building apps

Communication Skills Customer Support Documentation Payment Operations

Transaction Monitoring Lead

Posted 72 days ago

Hiring a Transaction Monitoring Lead to enhance compliance capabilities

Optimizing TM rules and alert systems to reduce false positives

Automation Documentation Fraud Detection Monitoring

Transaction Monitoring Manager

Posted 72 days ago

Lead and scale transaction monitoring operations

Optimize and automate AML controls and alert logic

Automation Data Analysis Documentation Monitoring

Regional Compliance Officer

Posted 72 days ago

Lead regional compliance oversight for Airtm in India.

Implement and monitor AML/CFT measures for the region.

Anti-money Laundering Audit Communication Compliance

Business Development Rep

Posted 72 days ago

Conduct regular outbound activities to key web3 prospects

Understand prospective customers' goals and assess how Allium.so can help

Blockchain Business Development Collaboration Tools Communication Skills

Blockchain Solutions Engineer

Posted 72 days ago

Engage with customers to understand blockchain data needs

Design and implement tailored data analytics solutions

Blockchain technology Data Analytics Data Security Documentation

Revenue Operations Lead

Posted 72 days ago

Optimize revenue operations for Allium.so's growth

Develop accurate revenue forecasting models

Bi tools Collaboration Tools Crm systems Data Analytics

Growth Engineer at Allium

Posted 72 days ago

Optimize internal customer support processes

Automate customer interactions for efficiency

Blockchain Collaboration Tools Documentation Engineer

Remote Customer Support Specialist

Posted 72 days ago

Provide customer support for AllTrails platform users.

Resolve customer issues and inquiries.

Communication Skills Customer Support Documentation Email Support

Compliance Program Director

Posted 72 days ago

Lead and shape Alma's compliance program

Ensure alignment with industry regulations

Communication Compliance Director Documentation

Forward Deployed Software Engineer (Data)

Posted 72 days ago

Leading the technical implementation of AI-driven data solutions

Translating customer needs into technical requirements

AWS Communication Skills Engineer Microsoft Azure

Forward Deployed Software Engineer

Posted 72 days ago

Lead technical implementation and optimization of data platform

Serve as primary technical contact for key accounts

Airflow AWS Databricks Engineer

Staff AI Backend Engineer

Posted 72 days ago

Architect and maintain high-performance backend infrastructure

Integrate advanced AI and LLM technologies into data workflows

AWS Docker Engineer FastAPI

Staff Software Engineer Role

Posted 72 days ago

Lead backend development for AI-powered data solutions

Architect scalable systems and APIs for enterprise clients

Agile Methodologies Angular AWS CI/CD

Staff Software Engineer, AI Backend

Posted 72 days ago

Design and build scalable AI-driven backend systems

Integrate advanced language models into data workflows

AWS Docker FastAPI Google Cloud Platform

Staff Software Engineer

Posted 72 days ago

Revolutionize enterprise data operations through AI solutions.

Automate and accelerate data tasks for overworked data teams.

Ai Airflow Ansible Api Development

AI-Powered Data Operations Revolution

Posted 72 days ago

Revolutionize enterprise data operations through AI automation.

Develop high-performance backend systems for AI solutions.

Docker Engineer FastAPI Python

Remote Business Operations Analyst

Posted 72 days ago

Offer a 100% remote analyst role

Simplify insurance billing operations

Business Operations Communication Skills Data Analysis Microsoft Excel

Healthcare Data Architect

Posted 72 days ago

Revolutionize healthcare data management through scalable pipelines and real-time analytics.

Lead data strategy and architecture to enhance healthcare visibility and awareness.

Cloud-native architecture Data Migration Data Modeling Data Warehousing

Marketing Analytics Manager

Posted 72 days ago

Descriptive Analytics: Build and automate reporting dashboards, Analyze marketing campaign

ness; Diagnostic Analytics: Investigate factors influencing app rank position, Conduct ad-hoc

A/b Testing Data Analysis Data Visualization Tools Predictive Modeling

Data Lead: ETL & Analytics

Posted 72 days ago

Improve data infrastructure Optimize performance and accessibility Enable data-driven

g Collaborate with cross-functional teams Mentor and lead data

Airflow Analytics AWS BigQuery

Marketing Operations Specialist

Posted 72 days ago

Execute and optimize marketing campaigns using Marketo

Maintain and improve operational campaign workflows

adobe Airtable Asana Communication

Anaplan Solution Architect India

Posted 72 days ago

Architect and deliver Anaplan planning solutions

Translate business requirements into multi-dimensional models

Agile Agile Methodology Anaplan Business Analytics