Model Evaluation Engineer

New

Skills

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management Model Evaluation Python Programming SQL Statistical Rigor Systematic Experimentation Voice Agent Technologies

As a Research Engineer specializing in Evaluations, you will be responsible for conducting comprehensive model evaluations focusing on accuracy, latency, and specific feature metrics. Your role will involve building competitive benchmarking pipelines and designing systematic experiments to assess the impacts of model changes.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation across various metrics.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets to stress-test specific capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Identify evaluation gaps proactively and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals and ability to debug issues without retraining from scratch.
  • Proficient in Python, with experience in writing clean evaluation scripts and working with data pipelines.
  • Comfortable with SQL and cloud infrastructure.
  • Strong metric intuition and understanding of good evaluation metrics ensuring statistical rigor.
  • Familiarity with voice agent stacks including VAD, ASR, turn detection, LLM, and TTS systems interaction.
  • Tinkerer mentality with a preference for shipping and iterating quickly.
  • Excellent communication skills to explain technical results and summarize findings.
  • Ownership mindset with a proactive approach to fill evaluation gaps.
  • Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.
  • Minimum salary range of $210K, up to $260K.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

Strategic Partner Development

Posted 8 days ago

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud Infrastructure Cross-Functional Leadership Market Analysis Mentoring and Coaching

Model Evaluation Engineer

Posted 6 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Junior Technical Program Manager

Posted 6 days ago

Support delivery of data center programs.

Manage timelines and project scope.

AI Infrastructure Cloud Infrastructure Cross-functional Coordination Data Center Infrastructure

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Strategic Sourcing Manager

Posted 5 days ago

Partner with engineering leaders for sourcing plans.

Lead sourcing across infrastructure and AI technology.

AI Technologies Cloud Infrastructure Data Analysis Developer Platforms

Engineering Program Manager

Posted 5 days ago

Unify technology strategy and enhance decision-making.

Oversee cross-functional initiatives from start to finish.

CI/CD Pipelines Cloud Infrastructure Cross-Functional Leadership Data Analysis

Senior ML Engineer

New

Develop and maintain ML platform infrastructure.

Provide shared components for deployment and API design.

Algorithms API Design Cloud Infrastructure Collaboration Tools

Senior DevOps Engineer

New

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

Director of Strategic Alliances

New

Lead strategic partnerships with key industry players.

Develop go-to-market strategies for AI and GPU deployments.

AI/ML Workloads Cloud Infrastructure Data Centers GPU Technologies

Privacy Engineer Role

New

Ensure user privacy across data handling.

Develop tools for privacy enhancement.

Cloud Infrastructure Code Review Data Mapping Go

Security & Infrastructure Lead

New

Lead security and infrastructure strategy.

Manage and develop security teams.

AWS CI/CD Cloud Infrastructure Container Orchestration

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

Model Evaluation Engineer

New

Lead end-to-end model evaluation.

Build competitive benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

Starlink Aviation Account Lead

Posted 8 days ago

Serve as the primary contact for Aviation accounts.

Manage onboarding and account tasks post-signature.

Aviation Industry Knowledge Consulting Contract Management Cross-Functional Coordination

Remote Product Manager

Posted 6 days ago

Hiring for a remote Product Manager position.

Position is full-time and has no geographical restrictions.

Agile Methodologies Communication Skills Cross-functional Collaboration Customer Feedback Analysis

Staff AI Engineer

Posted 17 days ago

Building advanced AI systems powered by LLMs and intelligent agents

Developing scalable backend systems

A/b Testing Api Integration Architecture AWS

Staff AI Engineer Role

Posted 17 days ago

Build and productionize advanced AI systems

Develop scalable LLM-powered applications and agents

Ab testing A/b Testing Api Integration Architecture

Senior AI Engineer Role

Posted 17 days ago

Build and deploy scalable AI systems for production use.

Develop advanced multi-agent architectures and conversational AI.

Api Integration Architecture AWS Azure

Principal Engineer

Posted 17 days ago

Lead design and development of data warehouse and analytics platform

Elevate engineering standards at Level AI

Data Modeling Data Pipeline Management Django Engineer

Mid Data Engineer

Posted 17 days ago

Expand and optimize data architecture

Assemble complex data sets

Data Pipeline Management Hadoop Kafka Python

AI Research Manager

Posted 17 days ago

Lead research direction for advanced AI systems

Guide the design of cutting-edge RAG systems

Data Analysis Deep Learning Documentation Leadership

Generalist - Language AI Evaluation

Posted 17 days ago

Evaluate LLM-generated responses

Conduct fact-checking on model responses

Ai Analytical Thinking Content Writing Data Annotation

Remote Chemistry AI Tutor

Posted 17 days ago

Connect chemistry experts to AI projects

Improve AI model reasoning in chemistry

Critical Thinking Data Annotation Model Evaluation Remote Collaboration

Remote Mathematics AI Tutor

Posted 17 days ago

Support AI model development with expert mathematics input

Evaluate and refine AI-generated mathematical responses

Data Annotation Mathematics Model Evaluation Prompt Engineering

Remote Electrical AI Tutor

Posted 17 days ago

Collaborate remotely on AI projects

Enhance generative AI with domain expertise

Analytical Thinking Data Annotation English Proficiency Generative AI

Civil Engineering AI Tutor

Posted 17 days ago

Enhance AI with civil engineering expertise

Generate and evaluate AI prompts

Analytical Skills Critical Thinking Generative AI Model Evaluation

Generalist - AI Language Model

Posted 17 days ago

Improve conversational AI systems

Assess model-generated responses

AI Development Analytical Skills Communication Skills Machine Learning

ML Research Engineer

Posted 17 days ago

Architect and maintain evaluation suites

Build scalable pipelines for model training

Data Engineering Model Evaluation Python Pytorch

AI/ML Product Builder

Posted 17 days ago

Define AI/ML agents for reliability

Prototype agent behaviours

Ai/ml CoPilot LLMs Model Evaluation

Data Scientist/AI Trainer

Posted 17 days ago

Develop and maintain Python code for data analysis, model evaluation, and AI workflow automation.

Design and refine prompts for LLMs to optimize conversational performance.

Conversational AI Data Analysis Data Science Machine Learning

Senior Product Manager - Intelligence Catalog

Posted 17 days ago

Lead and own the Intelligence Catalog and taxonomy

Drive improvements in noise reduction and precision/recall metrics

Ai/ml Communication Skills Data Science Enterprise saas

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Lead end-to-end model evaluation processes.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation

Model Evaluation Engineer

New

Lead end-to-end model evaluation.

Build competitive benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

Ubuntu Sales Engineer (Entry-Level)

Posted 17 days ago

Drive adoption of Ubuntu Pro in enterprise settings

Understand and address customer requirements

AWS Azure Cloud Computing Containers

Automation Lead

Posted 17 days ago

Lead team towards high-impact solutions, Work collaboratively with scientific teams, Stay updated

cutting-edge tools, Develop novel assays, Efficiently allocate team

Genomics Python Programming

Agentic AI Developer Research

Posted 17 days ago

Understand user experiences with agentic AI systems

Gather insights from developers and practitioners in the field

Android API Data Science Deep Learning

Cryptographic Client Server System

Posted 17 days ago

Implement public-key cryptography for client security.

Facilitate device addition and revocation for user accounts.

Cryptography Cybersecurity Management Data Encryption Python Programming

Cryptography Client-Server Assignment

Posted 17 days ago

Implement public-key cryptography for secure client-server communication.

Enable clients to manage device access through per-device keys.

Cybersecurity Management Data Encryption Data Security Python Programming

Senior Impact Analyst

Posted 17 days ago

Analyze and quantify sustainability solutions

Assess projects for investment impact

Analyst Business consulting Carbon Sequestration Machine Learning

Silicon Security Architect

Posted 17 days ago

Lead research, design, and development of secure solutions.

Apply innovative security primitives and attestation capabilities.

Deep Learning Gaming Network Security Python Programming

Hardware Engineer I

Posted 17 days ago

Develop validation and regression tools for image sensors and pipelines

Perform lab and real-world camera data collection and analysis

Python Programming

Microwave Cryogenics Engineer

Posted 17 days ago

Test, validate, and debug RF control systems

Operate and maintain cryogenic and vacuum systems

Python Programming

Senior Embedded Developer

Posted 17 days ago

Design and implement firmware for embedded devices

Manage project tasks and day-to-day activities

Algorithms CI/CD C programming Data Structures