Model Evaluation Engineer

New

Skills

Benchmarking Cloud Infrastructure Data Pipelines Documentation Machine Learning Model Evaluation Python SQL Statistical Analysis Voice Assistant Technology

We are seeking a Research Engineer specializing in Evaluations to lead the end-to-end and integration-level model evaluation process. You will be responsible for ensuring the accuracy, latency, and feature-specific metrics of our models, while also building and maintaining competitive benchmarking pipelines.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets to stress-test specific capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Identify evaluation gaps proactively and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals: Ability to interpret results and debug issues without retraining from scratch.
  • Proficiency in Python: Write clean evaluation scripts and work with data pipelines.
  • Comfortable with SQL and cloud infrastructure.
  • Metric intuition: Understanding of good evaluation metrics and ensuring statistical rigor.
  • Familiarity with voice agent stack: Understanding VAD, ASR, turn detection, LLM, TTS systems interaction.
  • Tinkerer mentality: Preference for shipping and iterating quickly.
  • Strong communication skills: Ability to explain technical results and summarize findings.
  • Ownership mindset: Proactively fill evaluation gaps.
  • Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Physics AI Research Project

Posted 38 days ago

Advance AI reasoning in physics

Validate and refine AI problem-solving capabilities

Benchmarking phd Remote Collaboration

Management Consulting AI Project

Posted 38 days ago

Benchmark and improve AI model capabilities

Design consulting-style prompts and evaluations

Ai Benchmarking Management Consulting Online Research

Vulnerability Research Engineer

Posted 38 days ago

Improve security detection capabilities in GitLab

Enhance vulnerability research and analysis

Benchmarking Devops Engineer Product Development

Energy Efficiency Account Manager

Posted 38 days ago

Promote energy efficiency concepts and services to customers

Identify cost-effective investments in energy efficiency

Benchmarking

HR Director - Total Rewards

Posted 38 days ago

Lead People Operations and HR functions effectively

Manage Total Rewards and compensation planning efficiently

Benchmarking Compliance Management Finance Technology

Compensation Business Partner

Posted 38 days ago

Collaborate with various stakeholders on compensation issues

Lead benchmarking efforts to ensure competitive compensation levels

Analytical Skills Benchmarking Communication Skills Data Analysis

Strategic Partnership Consultant

Posted 38 days ago

Analyze and advise on strategic partnership frameworks with global card networks.

Develop partner engagement models and contractual approaches.

Benchmarking Financial Analysis Go-to-market Strategies Market Research

Sensor Development Engineering Manager

Posted 38 days ago

Lead a team of Rust engineers for Sensor development

Ensure end-to-end delivery of Sensor features

Benchmarking CI/CD Performance Optimization Rust

Total Rewards Analyst Role

Posted 19 days ago

Maintain compensation records and job data.

Support merit, bonus, and promotion cycles.

AI Tools Benchmarking Benefits Administration Compensation Analysis

BEPS Fellow Position

Posted 21 days ago

Lead outreach to building owners for BEPS support.

Provide energy and emissions reduction services.

Benchmarking Building science Communication skills Data analysis

Model Evaluation Engineer

Posted 17 days ago

Lead end-to-end model evaluation.

Develop benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

Technical Staff - Top Secret

Posted 14 days ago

Design and optimize Starshield AI integrations.

Develop software for government use.

AI/ML API Development Benchmarking Data Analysis

Global Compensation Analyst

Posted 13 days ago

Support global compensation programs and annual cycles.

Conduct compensation analysis for pay equity and benchmarking.

Benchmarking Compa-ratio modeling Compensation analysis Data analysis

Compensation Intern

Posted 13 days ago

Assist in data collection and cleaning for compensation analysis.

Support market research and external benchmarking of pay ranges.

Attention to Detail Benchmarking Communication Skills Compensation Databases

AI Model Optimization Engineer

Posted 12 days ago

Optimize large-scale models using advanced techniques.

Develop pipelines for model conversion and deployment.

Benchmarking C++ (14/17/20) CUDA Efficient Attention Techniques

Compensation Lead Role

Posted 9 days ago

Utilize AI tools for process improvements.

Ensure job architecture supports global expansion.

AI tools Benchmarking Clear communication Data cleaning automation

Computational Protein Scientist

Posted 8 days ago

Model de novo protein generation.

Optimize protein designs with in-loop data.

AI Integration Benchmarking Biophysics Computational Biology

Artist Relations Manager

New

Liaise with product team for insights.

Manage and coach a direct report.

Artist Relations Benchmarking Content Creation Cross-Functional Collaboration

Bilingual Spanish Evaluator

Posted 7 days ago

Recruit native Spanish speakers from specified countries.

Create prompts for training AI models.

AI language models Benchmarking Bilingual writing Critical thinking

Compensation and Benefits Manager

Posted 6 days ago

Upgrade Compensation and Benefits strategy.

Manage and maintain compensation ranges.

Automation Tools Benchmarking Benefits Management Communication Skills

Performance Engineer Workload Porting

New

Port and enable benchmarks on new hardware.

Evaluate performance across various subsystems.

Benchmarking CPU/GPU Understanding Distributed Systems I/O Subsystems

Strategic Partner Development

Posted 29 days ago

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud Infrastructure Cross-Functional Leadership Market Analysis Mentoring and Coaching

AI-Enabled DevOps Engineer

Posted 21 days ago

Implement and maintain cloud infrastructure with IaC.

Improve CI/CD pipelines for applications and ML workloads.

Bash CI/CD Pipelines Cloud Infrastructure DevOps

Model Evaluation Engineer

Posted 28 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Junior Technical Program Manager

Posted 27 days ago

Support delivery of data center programs.

Manage timelines and project scope.

AI Infrastructure Cloud Infrastructure Cross-functional Coordination Data Center Infrastructure

Model Evaluation Engineer

Posted 27 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Model Evaluation Engineer

Posted 21 days ago

Oversee model evaluation across various metrics.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Evaluation Datasets

Strategic Sourcing Manager

Posted 27 days ago

Partner with engineering leaders for sourcing plans.

Lead sourcing across infrastructure and AI technology.

AI Technologies Cloud Infrastructure Data Analysis Developer Platforms

Engineering Program Manager

Posted 27 days ago

Unify technology strategy and enhance decision-making.

Oversee cross-functional initiatives from start to finish.

CI/CD Pipelines Cloud Infrastructure Cross-Functional Leadership Data Analysis

Customer Success Engineer

Posted 19 days ago

Provide hands-on support for databases.

Diagnose and resolve production issues.

Clickhouse Cloud Infrastructure Linux MongoDB

Senior ML Engineer

Posted 23 days ago

Develop and maintain ML platform infrastructure.

Provide shared components for deployment and API design.

Algorithms API Design Cloud Infrastructure Collaboration Tools

Senior DevOps Engineer

Posted 23 days ago

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

Director of Strategic Alliances

Posted 23 days ago

Lead strategic partnerships with key industry players.

Develop go-to-market strategies for AI and GPU deployments.

AI/ML Workloads Cloud Infrastructure Data Centers GPU Technologies

Privacy Engineer Role

Posted 23 days ago

Ensure user privacy across data handling.

Develop tools for privacy enhancement.

Cloud Infrastructure Code Review Data Mapping Go

Security & Infrastructure Lead

Posted 23 days ago

Lead security and infrastructure strategy.

Manage and develop security teams.

AWS CI/CD Cloud Infrastructure Container Orchestration

Model Evaluation Engineer

Posted 23 days ago

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

Posted 22 days ago

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

Model Evaluation Engineer

Posted 22 days ago

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management

Field Engineering Manager

Posted 19 days ago

Build and lead a team of Solutions Architects.

Align skills and engagement models to customer needs.

Account Management Big Data Cloud Infrastructure Customer Engagement

Model Evaluation Engineer

Posted 17 days ago

Lead end-to-end model evaluation.

Develop benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

Posted 9 days ago

Conduct comprehensive model evaluations.

Develop benchmarking pipelines for competitive analysis.

Cloud Infrastructure Competitive Benchmarking Data Pipelines Documentation

Model Evaluation Engineer

Posted 17 days ago

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Strategic Account Executive

Posted 16 days ago

Own and manage a named FS account list.

Drive net new ARR and expansion revenue.

Cloud Infrastructure Complex Deal Closing Consultative Selling Database Technologies

Principal Engineer Role

Posted 16 days ago

Lead the technical direction for identity and engagement services.

Oversee mission-critical infrastructure related to identity and engagement.

Analytics APIs Authentication Authorization

Senior Sales Engineer Role

Posted 9 days ago

Understand and assess enterprise customer needs.

Collaborate with various teams to manage accounts.

Business Intelligence Tools Cloud Infrastructure Data Analytics Database Management

Backend Software Engineer

Posted 16 days ago

Develop a high-performance search and indexing ecosystem.

Contribute to open-source libraries for data processing.

APIs Design C++ Cloud Infrastructure Data Processing Systems

Model Evaluation Engineer

Posted 16 days ago

Oversee model evaluation processes.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipelines

IT Director Role

Posted 14 days ago

Lead IT strategy to support company growth.

Build and manage an effective IT team.

AWS (EC2 S3 RDS VPC)

Backend Software Engineer

Posted 14 days ago

Lead backend system improvements.

Make architectural decisions for reliability.

API Design Backend Engineering Cloud Infrastructure Coding Fundamentals

Enterprise Account Executive

Posted 13 days ago

Drive revenue growth for GSI accounts.

Develop strategic insights for each firm.

Account Management APIs Business Case Development Cloud Infrastructure