Model Evaluation Engineer

New

Skills

Automatic Speech Recognition (ASR) Cloud Infrastructure Large Language Models (LLM) Machine Learning Fundamentals Python SQL Statistical Analysis Text-to-Speech (TTS) Turn Detection Voice Activity Detection (VAD)

The Research Engineer, Evaluations role focuses on comprehensive evaluation of models, ensuring they meet accuracy, latency, and feature-specific benchmarks. This position entails building and maintaining competitive benchmarking pipelines and designing systematic experiments to assess the impact of model changes.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments for model change impact assessment.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets for stress-testing capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Proactively identify evaluation gaps and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals with ability to interpret results and debug issues.
  • Proficient in Python, SQL, and cloud infrastructure.
  • Good metric intuition with an understanding of evaluation metrics and statistical rigor.
  • Familiarity with voice agent stack including VAD, ASR, turn detection, LLM, and TTS systems.
  • Tinkerer mentality with a preference for shipping and iterating quickly.
  • Strong communication skills to explain technical results and summarize findings.
  • Ownership mindset to proactively fill evaluation gaps.
  • Willingness to work at least 3-4 hours overlapping with Eastern US Time Zone.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Model Evaluation Engineer

Posted 21 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Strategic Partner Development

Posted 23 days ago

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud Infrastructure Cross-Functional Leadership Market Analysis Mentoring and Coaching

AI-Enabled DevOps Engineer

Posted 14 days ago

Implement and maintain cloud infrastructure with IaC.

Improve CI/CD pipelines for applications and ML workloads.

Bash CI/CD Pipelines Cloud Infrastructure DevOps

Model Evaluation Engineer

Posted 21 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Junior Technical Program Manager

Posted 21 days ago

Support delivery of data center programs.

Manage timelines and project scope.

AI Infrastructure Cloud Infrastructure Cross-functional Coordination Data Center Infrastructure

Model Evaluation Engineer

Posted 21 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Model Evaluation Engineer

Posted 14 days ago

Oversee model evaluation across various metrics.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Evaluation Datasets

Strategic Sourcing Manager

Posted 20 days ago

Partner with engineering leaders for sourcing plans.

Lead sourcing across infrastructure and AI technology.

AI Technologies Cloud Infrastructure Data Analysis Developer Platforms

Engineering Program Manager

Posted 20 days ago

Unify technology strategy and enhance decision-making.

Oversee cross-functional initiatives from start to finish.

CI/CD Pipelines Cloud Infrastructure Cross-Functional Leadership Data Analysis

Customer Success Engineer

Posted 13 days ago

Provide hands-on support for databases.

Diagnose and resolve production issues.

Clickhouse Cloud Infrastructure Linux MongoDB

Senior ML Engineer

Posted 17 days ago

Develop and maintain ML platform infrastructure.

Provide shared components for deployment and API design.

Algorithms API Design Cloud Infrastructure Collaboration Tools

Senior DevOps Engineer

Posted 17 days ago

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

Director of Strategic Alliances

Posted 17 days ago

Lead strategic partnerships with key industry players.

Develop go-to-market strategies for AI and GPU deployments.

AI/ML Workloads Cloud Infrastructure Data Centers GPU Technologies

Privacy Engineer Role

Posted 16 days ago

Ensure user privacy across data handling.

Develop tools for privacy enhancement.

Cloud Infrastructure Code Review Data Mapping Go

Security & Infrastructure Lead

Posted 16 days ago

Lead security and infrastructure strategy.

Manage and develop security teams.

AWS CI/CD Cloud Infrastructure Container Orchestration

Model Evaluation Engineer

Posted 16 days ago

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

Posted 16 days ago

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

Model Evaluation Engineer

Posted 15 days ago

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management

Model Evaluation Engineer

Posted 15 days ago

Lead end-to-end model evaluation.

Build competitive benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

Field Engineering Manager

Posted 12 days ago

Build and lead a team of Solutions Architects.

Align skills and engagement models to customer needs.

Account Management Big Data Cloud Infrastructure Customer Engagement

Model Evaluation Engineer

Posted 10 days ago

Lead end-to-end model evaluation.

Develop benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

Posted 3 days ago

Conduct comprehensive model evaluations.

Develop benchmarking pipelines for competitive analysis.

Cloud Infrastructure Competitive Benchmarking Data Pipelines Documentation

Model Evaluation Engineer

Posted 10 days ago

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Strategic Account Executive

Posted 10 days ago

Own and manage a named FS account list.

Drive net new ARR and expansion revenue.

Cloud Infrastructure Complex Deal Closing Consultative Selling Database Technologies

Principal Engineer Role

Posted 10 days ago

Lead the technical direction for identity and engagement services.

Oversee mission-critical infrastructure related to identity and engagement.

Analytics APIs Authentication Authorization

Senior Sales Engineer Role

New

Understand and assess enterprise customer needs.

Collaborate with various teams to manage accounts.

Business Intelligence Tools Cloud Infrastructure Data Analytics Database Management

Backend Software Engineer

Posted 9 days ago

Develop a high-performance search and indexing ecosystem.

Contribute to open-source libraries for data processing.

APIs Design C++ Cloud Infrastructure Data Processing Systems

Model Evaluation Engineer

Posted 9 days ago

Oversee model evaluation processes.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipelines

IT Director Role

Posted 7 days ago

Lead IT strategy to support company growth.

Build and manage an effective IT team.

AWS (EC2 S3 RDS VPC)

Backend Software Engineer

Posted 7 days ago

Lead backend system improvements.

Make architectural decisions for reliability.

API Design Backend Engineering Cloud Infrastructure Coding Fundamentals

Enterprise Account Executive

Posted 7 days ago

Drive revenue growth for GSI accounts.

Develop strategic insights for each firm.

Account Management APIs Business Case Development Cloud Infrastructure

Privacy-Focused Software Engineer

Posted 6 days ago

Design and implement privacy-focused software.

Translate privacy policies into technical safeguards.

AWS Cloud Infrastructure Data Security Golang

Model Evaluation Engineer

Posted 4 days ago

Own end-to-end model evaluation.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Curation Documentation Practices

Model Evaluation Engineer

New

Conduct end-to-end model evaluations.

Build competitive benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Experimental Design

Model Evaluation Engineer

Posted 3 days ago

Conduct end-to-end model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Senior Sales Engineer Role

New

Understand customer business and data needs.

Collaborate effectively with various teams.

Analytical Skills Business Intelligence Tools Cloud Infrastructure Cross-Functional Collaboration

Senior Sales Engineer

New

Understand Enterprise customer needs.

Collaborate with internal teams for account planning.

Business Intelligence Tools Cloud Data Warehouses Cloud Infrastructure Cross-functional Collaboration

Senior Sales Engineer Role

New

Understand enterprise customer needs and use cases.

Collaborate with multiple teams for solution development.

Business Intelligence Cloud Data Warehouses Cloud Infrastructure Cross-Functional Collaboration

Model Evaluation Engineer

New

Conduct end-to-end model evaluations.

Build and maintain benchmarking pipelines.

Cloud Infrastructure Data Pipelines Documentation Experiment Design

Model Evaluation Engineer

New

Conduct end-to-end model evaluations.

Design and implement benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Curation Documentation Practices

Senior Software Engineer

New

Design and develop scalable platform infrastructure.

Anticipate and mitigate bottlenecks in workflows.

Agile Methodologies AWS Cloud Infrastructure Docker

Model Evaluation Engineer

Posted 21 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Solutions Engineer Role

Posted 10 days ago

Develop and deliver technical presentations.

Communicate solution value to diverse audiences.

Application GRC AWS Cloud Security DevOps tools (Terraform

Solutions Engineer Role

Posted 6 days ago

Develop and deliver technical presentations.

Communicate solution value to audiences.

Application GRC AWS Azure Cloud Security

Model Evaluation Engineer

Posted 10 days ago

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Model Evaluation Engineer

Posted 3 days ago

Conduct end-to-end model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Conduct end-to-end model evaluations.

Design and implement benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Curation Documentation Practices

Software Engineer III, Google Cloud

Posted 35 days ago

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another.

Python

LLVM Compiler Developer

Posted 32 days ago

Develop and enhance LLVM and Clang based toolchain components

Collaborate with LLVM community for continuous integration

Back-end Bash C C++

Compiler Engineer (LLVM, C++)

Posted 32 days ago

Hiring experienced Compiler Engineers for LLVM and Clang toolchain development

Responsibilities include analyzing requirements, designing, and collaborating with the LLVM community

Back-end Bash Communication Llvm