Model Evaluation Engineer

New

Skills

Benchmarking Pipelines Cloud Infrastructure Dataset Curation Documentation Experiment Design Model Evaluation Python Programming SQL Statistical Analysis Voice Agent Technology

As a Research Engineer focused on Evaluations, you will take ownership of end-to-end model evaluation processes, ensuring high accuracy and performance metrics. Your role will involve building competitive benchmarking pipelines and designing systematic experiments to quantify the impact of model changes.

Key Responsibilities
  • Own end-to-end and integration-level model evaluation across various metrics.
  • Build and maintain competitive benchmarking pipelines.
  • Design and run systematic experiments to measure the impact of model changes.
  • Onboard, curate, and maintain evaluation datasets.
  • Create evaluation subsets to stress-test specific capabilities and edge cases.
  • Define evaluation metrics for real-world performance.
  • Translate qualitative customer feedback into quantifiable evaluation criteria.
  • Collaborate with customer-facing teams to understand pain points and convert them into research priorities.
  • Maintain clean evaluation pipelines and clear documentation.
  • Identify evaluation gaps proactively and propose solutions.
Required Skills & Qualifications
  • Strong understanding of ML fundamentals.
  • Proficiency in Python and SQL.
  • Experience with cloud infrastructure.
  • Understanding of good evaluation metrics and statistical rigor.
  • Familiarity with voice agent stack including VAD, ASR, LLM, and TTS systems.
  • Tinkerer mentality with a preference for shipping and iterating quickly.
  • Excellent communication skills for explaining technical results.
  • Ownership mindset to proactively fill evaluation gaps.
  • Ability to work at least 3-4 hours overlapping with Eastern US Time Zone.
  • Experience in creating and maintaining evaluation datasets.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management

Strategic Partner Development

Posted 8 days ago

Architect alliances with hardware partners.

Identify decision-makers within partner organizations.

Cloud Infrastructure Cross-Functional Leadership Market Analysis Mentoring and Coaching

Model Evaluation Engineer

Posted 6 days ago

Evaluate models across accuracy and latency.

Build benchmarking pipelines for competitive analysis.

Automatic Speech Recognition (ASR) Cloud Infrastructure Data Pipelines Large Language Models (LLM)

Junior Technical Program Manager

Posted 6 days ago

Support delivery of data center programs.

Manage timelines and project scope.

AI Infrastructure Cloud Infrastructure Cross-functional Coordination Data Center Infrastructure

Model Evaluation Engineer

Posted 6 days ago

Conduct comprehensive model evaluations.

Establish and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Evaluation Metrics

Strategic Sourcing Manager

Posted 5 days ago

Partner with engineering leaders for sourcing plans.

Lead sourcing across infrastructure and AI technology.

AI Technologies Cloud Infrastructure Data Analysis Developer Platforms

Engineering Program Manager

Posted 5 days ago

Unify technology strategy and enhance decision-making.

Oversee cross-functional initiatives from start to finish.

CI/CD Pipelines Cloud Infrastructure Cross-Functional Leadership Data Analysis

Senior ML Engineer

New

Develop and maintain ML platform infrastructure.

Provide shared components for deployment and API design.

Algorithms API Design Cloud Infrastructure Collaboration Tools

Senior DevOps Engineer

New

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

Director of Strategic Alliances

New

Lead strategic partnerships with key industry players.

Develop go-to-market strategies for AI and GPU deployments.

AI/ML Workloads Cloud Infrastructure Data Centers GPU Technologies

Privacy Engineer Role

New

Ensure user privacy across data handling.

Develop tools for privacy enhancement.

Cloud Infrastructure Code Review Data Mapping Go

Security & Infrastructure Lead

New

Lead security and infrastructure strategy.

Manage and develop security teams.

AWS CI/CD Cloud Infrastructure Container Orchestration

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Develop and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Data Pipelines Documentation

Model Evaluation Engineer

New

Conduct comprehensive model evaluations.

Build and maintain benchmarking pipelines.

Benchmarking Pipelines Cloud Infrastructure Customer Feedback Analysis Data Pipeline Management

Model Evaluation Engineer

New

Lead end-to-end model evaluation.

Build competitive benchmarking pipelines.

Benchmarking Cloud Infrastructure Data Pipelines Documentation

C# Backend Engineer Remote

Posted 17 days ago

Develop and maintain backend C# services and APIs

Collaborate remotely with a distributed team

Agile Agile Methodologies Api Development Back-end

C# Development Engineer

Posted 17 days ago

Contribute to back-end services used by company products

Collaborate with a global team

Agile Agile Methodologies Ai Tools Api Development

Back-End C# Developer

Posted 17 days ago

Enhance back-end services for millions of users

Collaborate with a global team on complex API implementations

Agile Agile Methodology Api Integration Back-end

Sys Admin Support

Posted 17 days ago

Support critical infrastructure Manage device tools Provide desktop/laptop support Maintain

None

AWS Documentation Linux macOS

SOX Technical Control Testing

Posted 17 days ago

Evaluate and test IT controls for SOX compliance.

Collaborate with technology teams to assess and improve control design.

Accounting Change Management Communication Documentation

Remote Developer Support Engineer

Posted 17 days ago

Provide remote technical support

Assist with Airtable APIs and integrations

Apis Customer Service Customer Support Developer Advocate

Transaction Monitoring Lead

Posted 17 days ago

Hiring a Transaction Monitoring Lead to enhance compliance capabilities

Optimizing TM rules and alert systems to reduce false positives

Automation Documentation Fraud Detection Monitoring

Transaction Monitoring Manager

Posted 17 days ago

Lead and scale transaction monitoring operations

Optimize and automate AML controls and alert logic

Automation Data Analysis Documentation Monitoring

Regional Compliance Officer

Posted 17 days ago

Lead regional compliance oversight for Airtm in India.

Implement and monitor AML/CFT measures for the region.

Anti-money Laundering Audit Communication Compliance

Blockchain Solutions Engineer

Posted 17 days ago

Engage with customers to understand blockchain data needs

Design and implement tailored data analytics solutions

Blockchain technology Data Analytics Data Security Documentation

Growth Engineer at Allium

Posted 17 days ago

Optimize internal customer support processes

Automate customer interactions for efficiency

Blockchain Collaboration Tools Documentation Engineer

Remote Customer Support Specialist

Posted 17 days ago

Provide customer support for AllTrails platform users.

Resolve customer issues and inquiries.

Communication Skills Customer Support Documentation Email Support

Compliance Program Director

Posted 17 days ago

Lead and shape Alma's compliance program

Ensure alignment with industry regulations

Communication Compliance Director Documentation

Marketing Operations Specialist

Posted 17 days ago

Execute and optimize marketing campaigns using Marketo

Maintain and improve operational campaign workflows

adobe Airtable Asana Communication

Anaplan Solution Architect India

Posted 17 days ago

Architect and deliver Anaplan planning solutions

Translate business requirements into multi-dimensional models

Agile Agile Methodology Anaplan Business Analytics

Anaplan Solution Architect

Posted 17 days ago

Architect, design, and deliver Anaplan planning solutions

Translate client requirements into sophisticated models

Agile Agile Methodology Anaplan Business Analytics

API Support Engineer Role

Posted 17 days ago

Provide first-line API support to customers

Troubleshoot and resolve integration issues

Api Documentation Cross-functional Collaboration Cross-functional Communication Customer success

Remote Technical Support Engineer

Posted 17 days ago

Provide remote technical support

Enhance customer satisfaction

Collaboration Communication Skills Customer Support Dns

Support & QA Engineer

Posted 17 days ago

Drive product quality through support insights

Proactively identify and resolve technical issues

Debugging Documentation Engineer QA

WordPress Playground Developer

Posted 17 days ago

Develop low-level PHP tools for WordPress Core

Read and implement web standards proposals

Community engagement Documentation Editing Front end

CRM Business Analyst

Posted 17 days ago

Bridging gap between non-profit needs and technical solutions

Gathering and analyzing user requirements

Analyst Automation Business Analyst Crm

Staff IAM/PAM Administrator Role

Posted 17 days ago

Support identity and privileged access operations.

Monitor IAM and PAM platforms.

Automation AWS Cloud Cyberark

Compliance Analyst Lead

Posted 17 days ago

Ensure high-quality onboarding for charities and campaigns

Safeguard JustGiving from financial crime risks

Analyst Compliance Data Analytics Documentation

Sr. Principal - HRIS

Posted 17 days ago

Lead HR transformation through Workday optimization

Guide cross-functional teams in complex Workday changes

Ai Automation Data Analytics Documentation

Customer Contact Data Analyst

Posted 17 days ago

Lead analysis and resolution of customer contact data changes.

Enhance customer data health through defined playbooks.

Analyst Automation Communication Crm

UX Engineer - Senior

Posted 17 days ago

Design valuable and usable solutions for user problems

Collaborate with product and engineering teams

Collaboration Communication Css Documentation

Senior Auditor - SOX Compliance

Posted 17 days ago

Lead Sarbanes-Oxley compliance initiatives within the Internal Audit team

Collaborate with various departments to ensure control awareness and compliance

Accounting Audit Data Analytics Documentation

Sr Manager - Customer Support

Posted 17 days ago

Lead team development and mentorship

Drive operational efficiency and performance metrics

Collaboration Communication Communication Skills Continuous improvement

Customer Success Implementation Consultant

Posted 17 days ago

Help non-profit organizations excel through product implementation and consulting

Maintain strong customer relationships

Communication Crm Customer Relations Database Management

Education Services Developer

Posted 17 days ago

Collaborate with internal teams to design and build virtual training sandbox environments

Develop and maintain database and lab environments to support training needs

Agile Angular Azure C#

Customer Support Staff

Posted 17 days ago

Provide high-quality technical support to B2B clients.

Troubleshoot software incidents and escalate complex issues.

Apis Cloud Communication Crm

Marketing Ops Specialist

Posted 17 days ago

Hiring a remote Marketing Operations Specialist

Full-time position

Asana Automation Communication Communication Tools