Staff AI Engineer, Model Post-Training and Alignment

New

Skills

Lead and execute post-training pipelines for LLMs (supervised fine-tuning, RL). Design advanced training paradigms such as DPO and GRPO. Develop domain-specific data recipes, curation, and augmentation. Post-train specialized small models from scratch: architecture, data, optimization. Build and refine Reward Models to support alignment and downstream optimization. Improve inference efficiency with low-latency serving (vLLM, SGLang). Bachelor's in CS/AI/ML or related; 8+ years industry experience. Strong hands-on experience with post-training pipelines for large models. Deep familiarity with DPO, GRPO, and RL-based post-training methods. Experience training specialized small models from scratch. Solid understanding of RL fundamentals and alignment applications. Experience deploying models in low-latency production (vLLM, SGLang). Competitive total compensation package L&D programs and education subsidy Team building programs and company events Wellness and meal allowances Healthcare schemes for employees and dependants Additional benefits disclosed during the process

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job: