Research Engineer, Evaluations

New

Skills

Own end-to-end and integration-level model evaluation across accuracy, latency, and feature-specific metrics Build and maintain competitive benchmarking pipelines Design and run systematic experiments to measure the impact of model changes Onboard, curate, and maintain evaluation datasets Create evaluation subsets to stress-test specific capabilities and edge cases Define evaluation metrics for real-world performance Translate qualitative customer feedback into quantifiable evaluation criteria Work with customer-facing teams to understand pain points and convert them into research priorities Maintain clean evaluation pipelines and clear documentation Identify evaluation gaps proactively and propose solutions ML fundamentals: Interpret results and debug issues without training from scratch Strong Python skills: Write clean evaluation scripts, work with data pipelines, comfortable with SQL and cloud infrastructure Metric intuition: Understanding of good evaluation metrics and ensuring statistical rigor Voice agent stack familiarity: Understands VAD, ASR, turn detection, LLM, TTS systems interaction Tinkerer mentality: Preference for shipping and iterating quickly Communication skills: Explain technical results, summarize findings, and translate customer feedback Ownership mindset: Proactively fill evaluation gaps Work at least 3-4 hours overlapping with Eastern US Time Zone Pay range: $210K - $260K

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job: