Model Evaluation Engineer

New

Skills

Benchmarking Cloud Infrastructure Data Pipelines Documentation Machine Learning Model Evaluation Python SQL Statistical Analysis Voice Assistant Technology

We are seeking a Research Engineer specializing in Evaluations to lead the end-to-end and integration-level model evaluation process. You will be responsible for ensuring the accuracy, latency, and feature-specific metrics of our models, while also building and maintaining competitive benchmarking pipelines.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets to stress-test specific capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Identify evaluation gaps proactively and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals: Ability to interpret results and debug issues without retraining from scratch.
  • Proficiency in Python: Write clean evaluation scripts and work with data pipelines.
  • Comfortable with SQL and cloud infrastructure.
  • Metric intuition: Understanding of good evaluation metrics and ensuring statistical rigor.
  • Familiarity with voice agent stack: Understanding VAD, ASR, turn detection, LLM, TTS systems interaction.
  • Tinkerer mentality: Preference for shipping and iterating quickly.
  • Strong communication skills: Ability to explain technical results and summarize findings.
  • Ownership mindset: Proactively fill evaluation gaps.
  • Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Physics AI Research Project

Posted 17 days ago

Advance AI reasoning in physics

Validate and refine AI problem-solving capabilities

Benchmarking phd Remote Collaboration

Management Consulting AI Project

Posted 17 days ago

Benchmark and improve AI model capabilities

Design consulting-style prompts and evaluations

Ai Benchmarking Management Consulting Online Research

Vulnerability Research Engineer

Posted 17 days ago

Improve security detection capabilities in GitLab

Enhance vulnerability research and analysis

Benchmarking Devops Engineer Product Development

Energy Efficiency Account Manager

Posted 17 days ago

Promote energy efficiency concepts and services to customers

Identify cost-effective investments in energy efficiency

Benchmarking

HR Director - Total Rewards

Posted 17 days ago

Lead People Operations and HR functions effectively

Manage Total Rewards and compensation planning efficiently

Benchmarking Compliance Management Finance Technology

Compensation Business Partner

Posted 17 days ago

Collaborate with various stakeholders on compensation issues

Lead benchmarking efforts to ensure competitive compensation levels

Analytical Skills Benchmarking Communication Skills Data Analysis

Strategic Partnership Consultant

Posted 17 days ago

Analyze and advise on strategic partnership frameworks with global card networks.

Develop partner engagement models and contractual approaches.

Benchmarking Financial Analysis Go-to-market Strategies Market Research

Sensor Development Engineering Manager

Posted 17 days ago

Lead a team of Rust engineers for Sensor development

Ensure end-to-end delivery of Sensor features

Benchmarking CI/CD Performance Optimization Rust

Strategic Partner Development

Posted 8 days ago

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud Infrastructure Cross-Functional Leadership Market Analysis Mentoring and Coaching

Model Evaluation Engineer

Posted 6 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Junior Technical Program Manager

Posted 6 days ago

Support delivery of data center programs.

Manage timelines and project scope.

AI Infrastructure Cloud Infrastructure Cross-functional Coordination Data Center Infrastructure

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Strategic Sourcing Manager

Posted 5 days ago

Partner with engineering leaders for sourcing plans.

Lead sourcing across infrastructure and AI technology.

AI Technologies Cloud Infrastructure Data Analysis Developer Platforms

Engineering Program Manager

Posted 5 days ago

Unify technology strategy and enhance decision-making.

Oversee cross-functional initiatives from start to finish.

CI/CD Pipelines Cloud Infrastructure Cross-Functional Leadership Data Analysis

Senior ML Engineer

New

Develop and maintain ML platform infrastructure.

Provide shared components for deployment and API design.

Algorithms API Design Cloud Infrastructure Collaboration Tools

Senior DevOps Engineer

New

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

Director of Strategic Alliances

New

Lead strategic partnerships with key industry players.

Develop go-to-market strategies for AI and GPU deployments.

AI/ML Workloads Cloud Infrastructure Data Centers GPU Technologies

Privacy Engineer Role

New

Ensure user privacy across data handling.

Develop tools for privacy enhancement.

Cloud Infrastructure Code Review Data Mapping Go

Security & Infrastructure Lead

New

Lead security and infrastructure strategy.

Manage and develop security teams.

AWS CI/CD Cloud Infrastructure Container Orchestration

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management

Remote Senior Data Engineer

Posted 17 days ago

Hiring a remote Senior Data Engineer for Apollo

Full-time position in Poland

AWS Azure BigQuery Cloud

Senior AI Engineer

Posted 17 days ago

Build and productionize advanced AI systems

Develop large language model platforms

Ab testing Ai Systems Api Integration Architecture

Principal Data Engineer

Posted 17 days ago

Set technical direction for data initiatives

Design and build scalable data pipelines

Architecture Databricks Data Compliance Data Modeling

AJO Platform Developer

Posted 17 days ago

Configure, build, and test AJO journeys.

Implement audiences and decision rules.

Apis Data Pipelines Html Json

Machine Learning Engineer Role

Posted 17 days ago

Develop machine learning models for credit decisions

Automate customer service with NLP/LLMs

Cloud Cloud Computing Data Pipelines Data Science

Data & Analytics Engineer

Posted 17 days ago

Hiring a remote Data & Analytics Engineer for a full-time position

Position can be done remotely anywhere in the United States

Algorithms Cross-functional Collaboration Data Data Analysis

AI Engineers - Remote Role

Posted 17 days ago

Lead AI Agent Pilots & Implementations

Understand & Map GTM Workflows

Apis Crm systems Cross-functional Communication Data Pipelines

Data Scientist MLOps Role

Posted 17 days ago

Hire a remote Data Scientist

Emphasize MLOps expertise

AWS Azure CI/CD Data Pipelines

Staff Data Engineer Role

Posted 17 days ago

Hire a remote staff data engineer

Build and maintain scalable data platforms

AWS Cloud Platforms Data Data Engineering

Energy Forecasting Data Scientist

Posted 17 days ago

Build advanced models for electricity demand and renewable generation forecasting

Automate and maintain forecasting processes

Automation Collaboration Communication Data Modeling

Foundational ML Engineer Role

Posted 17 days ago

Hire a remote Machine Learning Engineer

Develop and deploy foundational models

Analytical Skills Cloud Computing Collaboration Data Pipelines

Data Analyst at Chainlink Labs

Posted 17 days ago

Proactively monitor and mitigate risks<br>Support incident response team as escalation

tain and update critical datasets<br>Debug issues based on contextual knowledge<br>Show respect for

Apis Data Pipelines Json Python

ML Researcher

Posted 17 days ago

- Passion for audio AI and accessibility - Drive and work ethic to lead in AI evolution -

and efficient problem-solving skills - Commitment to excellence and exceeding expectations -

Algorithms API Data Management Data Pipelines

Next-Gen AI-Powered Solutions

Posted 17 days ago

Seeking experienced React Native Software Engineer for AI-focused role

Emphasis on system architecture and code quality standards

A/b Testing AWS Data Pipelines Firebase

Part-Time Data Engineer

Posted 17 days ago

Develop data pipelines and workflows using Azure Data Factory

Ingest and transform data from Azure Blob Storage

Azure Azure data factory Cloud Platforms Data Engineering

Data Pipeline Developer

Posted 17 days ago

Develop data pipelines and reporting solutions

Collaborate with data analysts to transform data into insights

Analytics Data Modeling Data Pipelines Etl

Data Engineer at H1

Posted 17 days ago

Building scalable data pipelines and enrichment workflows

Transforming raw data into accurate insights

AWS Data Engineering Data Pipelines EMR

Senior Data Engineer at SimplyAnalytics

Posted 17 days ago

Design and implement scalable data pipelines for processing large datasets

Collaborate with cross-functional teams to understand data requirements

AWS Data Analytics Data Pipelines Data Visualization

Junior Data Engineer Role

Posted 17 days ago

Develop and maintain data pipelines

Optimize and manage data warehouses

Data Pipelines Data Security Data Warehousing Etl

GCP DevOps Engineer Role

Posted 17 days ago

Configure and manage GCP services

Design and deploy scalable data pipelines

Big Data Solutions CI/CD Data Ingestion Data Pipelines

Voice AI Research Engineer

Posted 17 days ago

Advance text-to-speech research and technology

Develop scalable data and model management systems

Data Pipelines Deep Learning Engineer Machine Learning

Founding ML Engineer, Roblox

Posted 17 days ago

Build and scale predictive models for Roblox game success.

Develop and maintain robust data infrastructure.

Analytics Data Engineering Data Pipelines Machine Learning

Senior Backend Engineer Integrations

Posted 17 days ago

Develop scalable backend integrations

Build new crypto-related features

AWS Databases Data Pipelines Distributed systems

AI Context Engineering Consultant

Posted 17 days ago

Design and deploy intelligent AI agents.

Integrate and fine-tune LLMs for specific applications.

API Api Integration Database Management Data Pipelines

Senior Backend Rust Engineer

Posted 17 days ago

Design and build next-generation risk analysis systems

Develop data pipelines for actionable insights

Data Pipelines DeFi Kubernetes Rust

AI Product Manager

Posted 17 days ago

Own the AI-powered product strategy and roadmap

Identify automation opportunities tied to user pain and outcomes

A/b Testing Ai Automation Data Pipelines

Senior Analytics Engineer - GTM

Posted 17 days ago

Design and scale analytical data products for GTM

Partner with GTM Strategy for analytical requirements

Data Modeling Data Pipelines Hubspot Python

Billing Engineer Project

Posted 17 days ago

Architect and build a high-performance usage tracking pipeline for billing

Design pricing primitives for per-event usage billing and complex contracts

Data Modeling Data Pipelines Distributed systems Mentoring