Model Evaluation Engineer

New

Skills

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management Model Evaluation Python Programming SQL Statistical Rigor Systematic Experimentation Voice Agent Technologies

As a Research Engineer specializing in Evaluations, you will be responsible for conducting comprehensive model evaluations focusing on accuracy, latency, and specific feature metrics. Your role will involve building competitive benchmarking pipelines and designing systematic experiments to assess the impacts of model changes.

Key Responsibilities

Own end-to-end and integration-level model evaluation across various metrics.
Build and maintain competitive benchmarking pipelines.
Design and run systematic experiments to measure the impact of model changes.
Onboard, curate, and maintain evaluation datasets.
Create evaluation subsets to stress-test specific capabilities and edge cases.
Define evaluation metrics for real-world performance.
Translate qualitative customer feedback into quantifiable evaluation criteria.
Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
Maintain clean evaluation pipelines and clear documentation.
Identify evaluation gaps proactively and propose solutions.

Required Skills & Qualifications

Strong understanding of ML fundamentals and ability to debug issues without retraining from scratch.
Proficient in Python, with experience in writing clean evaluation scripts and working with data pipelines.
Comfortable with SQL and cloud infrastructure.
Strong metric intuition and understanding of good evaluation metrics ensuring statistical rigor.
Familiarity with voice agent stacks including VAD, ASR, turn detection, LLM, and TTS systems interaction.
Tinkerer mentality with a preference for shipping and iterating quickly.
Excellent communication skills to explain technical results and summarize findings.
Ownership mindset with a proactive approach to fill evaluation gaps.
Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.
Minimum salary range of $210K, up to $260K.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Similar Jobs

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

View Job

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

View Job

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

View Job

Strategic Partner Development

Posted 8 days ago

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud Infrastructure Cross-Functional Leadership Market Analysis Mentoring and Coaching

View Job

Model Evaluation Engineer

Posted 6 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

View Job

Junior Technical Program Manager

Posted 6 days ago

Support delivery of data center programs.

Manage timelines and project scope.

AI Infrastructure Cloud Infrastructure Cross-functional Coordination Data Center Infrastructure

View Job

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

View Job

Strategic Sourcing Manager

Posted 5 days ago

Partner with engineering leaders for sourcing plans.

Lead sourcing across infrastructure and AI technology.

AI Technologies Cloud Infrastructure Data Analysis Developer Platforms

View Job

Engineering Program Manager

Posted 5 days ago

Unify technology strategy and enhance decision-making.

Oversee cross-functional initiatives from start to finish.

CI/CD Pipelines Cloud Infrastructure Cross-Functional Leadership Data Analysis

View Job

Senior ML Engineer

New

Develop and maintain ML platform infrastructure.

Provide shared components for deployment and API design.

Algorithms API Design Cloud Infrastructure Collaboration Tools

View Job

Senior DevOps Engineer

New

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

View Job

Director of Strategic Alliances

New

Lead strategic partnerships with key industry players.

Develop go-to-market strategies for AI and GPU deployments.

AI/ML Workloads Cloud Infrastructure Data Centers GPU Technologies

View Job

Privacy Engineer Role

New

Ensure user privacy across data handling.

Develop tools for privacy enhancement.

Cloud Infrastructure Code Review Data Mapping Go

View Job

Security & Infrastructure Lead

New

Lead security and infrastructure strategy.

Manage and develop security teams.

AWS CI/CD Cloud Infrastructure Container Orchestration

View Job

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

View Job

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

View Job

Model Evaluation Engineer

New

Lead end-to-end model evaluation.

Build competitive benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

View Job

Starlink Aviation Account Lead

Posted 8 days ago

Serve as the primary contact for Aviation accounts.

Manage onboarding and account tasks post-signature.

Aviation Industry Knowledge Consulting Contract Management Cross-Functional Coordination

View Job

Remote Product Manager

Posted 6 days ago

Hiring for a remote Product Manager position.

Position is full-time and has no geographical restrictions.

Agile Methodologies Communication Skills Cross-functional Collaboration Customer Feedback Analysis

View Job

Staff AI Engineer

Posted 17 days ago

Building advanced AI systems powered by LLMs and intelligent agents

Developing scalable backend systems

A/b Testing Api Integration Architecture AWS

View Job

Staff AI Engineer Role

Posted 17 days ago

Build and productionize advanced AI systems

Develop scalable LLM-powered applications and agents

Ab testing A/b Testing Api Integration Architecture

View Job

Senior AI Engineer Role

Posted 17 days ago

Build and deploy scalable AI systems for production use.

Develop advanced multi-agent architectures and conversational AI.

Api Integration Architecture AWS Azure

View Job

Principal Engineer

Posted 17 days ago

Lead design and development of data warehouse and analytics platform

Elevate engineering standards at Level AI

Data Modeling Data Pipeline Management Django Engineer

View Job

Mid Data Engineer

Posted 17 days ago

Expand and optimize data architecture

Assemble complex data sets

Data Pipeline Management Hadoop Kafka Python

View Job

AI Research Manager

Posted 17 days ago

Lead research direction for advanced AI systems

Guide the design of cutting-edge RAG systems

Data Analysis Deep Learning Documentation Leadership

View Job

Generalist - Language AI Evaluation

Posted 17 days ago

Evaluate LLM-generated responses

Conduct fact-checking on model responses

Ai Analytical Thinking Content Writing Data Annotation

View Job

Remote Chemistry AI Tutor

Posted 17 days ago

Connect chemistry experts to AI projects

Improve AI model reasoning in chemistry

Critical Thinking Data Annotation Model Evaluation Remote Collaboration

View Job

Remote Mathematics AI Tutor

Posted 17 days ago

Support AI model development with expert mathematics input

Evaluate and refine AI-generated mathematical responses

Data Annotation Mathematics Model Evaluation Prompt Engineering

View Job

Remote Electrical AI Tutor

Posted 17 days ago

Collaborate remotely on AI projects

Enhance generative AI with domain expertise

Analytical Thinking Data Annotation English Proficiency Generative AI

View Job

Civil Engineering AI Tutor

Posted 17 days ago

Enhance AI with civil engineering expertise

Generate and evaluate AI prompts

Analytical Skills Critical Thinking Generative AI Model Evaluation

View Job

Generalist - AI Language Model

Posted 17 days ago

Improve conversational AI systems

Assess model-generated responses

AI Development Analytical Skills Communication Skills Machine Learning

View Job

ML Research Engineer

Posted 17 days ago

Architect and maintain evaluation suites

Build scalable pipelines for model training

Data Engineering Model Evaluation Python Pytorch

View Job

AI/ML Product Builder

Posted 17 days ago

Define AI/ML agents for reliability

Prototype agent behaviours

Ai/ml CoPilot LLMs Model Evaluation

View Job

Data Scientist/AI Trainer

Posted 17 days ago

Develop and maintain Python code for data analysis, model evaluation, and AI workflow automation.

Design and refine prompts for LLMs to optimize conversational performance.

Conversational AI Data Analysis Data Science Machine Learning

View Job

Senior Product Manager - Intelligence Catalog

Posted 17 days ago

Lead and own the Intelligence Catalog and taxonomy

Drive improvements in noise reduction and precision/recall metrics

Ai/ml Communication Skills Data Science Enterprise saas

View Job

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

View Job

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

View Job

Model Evaluation Engineer

New

Lead end-to-end model evaluation.

Build competitive benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

View Job

Ubuntu Sales Engineer (Entry-Level)

Posted 17 days ago

Drive adoption of Ubuntu Pro in enterprise settings

Understand and address customer requirements

AWS Azure Cloud Computing Containers

View Job

Automation Lead

Posted 17 days ago

Lead team towards high-impact solutions, Work collaboratively with scientific teams, Stay updated

cutting-edge tools, Develop novel assays, Efficiently allocate team

Genomics Python Programming

View Job

Agentic AI Developer Research

Posted 17 days ago

Understand user experiences with agentic AI systems

Gather insights from developers and practitioners in the field

Android API Data Science Deep Learning

View Job

Cryptographic Client Server System

Posted 17 days ago

Implement public-key cryptography for client security.

Facilitate device addition and revocation for user accounts.

Cryptography Cybersecurity Management Data Encryption Python Programming

View Job

Cryptography Client-Server Assignment

Posted 17 days ago

Implement public-key cryptography for secure client-server communication.

Enable clients to manage device access through per-device keys.

Cybersecurity Management Data Encryption Data Security Python Programming

View Job

Senior Impact Analyst

Posted 17 days ago

Analyze and quantify sustainability solutions

Assess projects for investment impact

Analyst Business consulting Carbon Sequestration Machine Learning

View Job

Silicon Security Architect

Posted 17 days ago

Lead research, design, and development of secure solutions.

Apply innovative security primitives and attestation capabilities.

Deep Learning Gaming Network Security Python Programming

View Job

Hardware Engineer I

Posted 17 days ago

Develop validation and regression tools for image sensors and pipelines

Perform lab and real-world camera data collection and analysis

Python Programming

View Job

Microwave Cryogenics Engineer

Posted 17 days ago

Test, validate, and debug RF control systems

Operate and maintain cryogenic and vacuum systems

Python Programming

View Job

Senior Embedded Developer

Posted 17 days ago

Design and implement firmware for embedded devices

Manage project tasks and day-to-day activities

Algorithms CI/CD C programming Data Structures

View Job

Model Evaluation Engineer

Skills

Key Responsibilities

Required Skills & Qualifications

Similar Jobs

Model Evaluation Engineer

Model Evaluation Engineer

Model Evaluation Engineer

Strategic Partner Development

Model Evaluation Engineer

Junior Technical Program Manager

Model Evaluation Engineer

Strategic Sourcing Manager

Engineering Program Manager

Senior ML Engineer

Senior DevOps Engineer

Director of Strategic Alliances

Privacy Engineer Role

Security & Infrastructure Lead

Model Evaluation Engineer

Model Evaluation Engineer

Model Evaluation Engineer

Starlink Aviation Account Lead

Remote Product Manager

Staff AI Engineer

Staff AI Engineer Role

Senior AI Engineer Role

Principal Engineer

Mid Data Engineer

AI Research Manager

Generalist - Language AI Evaluation

Remote Chemistry AI Tutor

Remote Mathematics AI Tutor

Remote Electrical AI Tutor

Civil Engineering AI Tutor

Generalist - AI Language Model

ML Research Engineer

AI/ML Product Builder

Data Scientist/AI Trainer

Senior Product Manager - Intelligence Catalog

Model Evaluation Engineer

Model Evaluation Engineer

Model Evaluation Engineer

Ubuntu Sales Engineer (Entry-Level)

Automation Lead

Agentic AI Developer Research

Cryptographic Client Server System

Cryptography Client-Server Assignment

Senior Impact Analyst

Silicon Security Architect

Hardware Engineer I

Microwave Cryogenics Engineer

Senior Embedded Developer

Creating Your Profile...