New
Lead and execute post-training pipelines for LLMs (supervised fine-tuning, RL). Design advanced training paradigms such as DPO and GRPO. Develop domain-specific data recipes, curation, and augmentation. Post-train specialized small models from scratch: architecture, data, optimization. Build and refine Reward Models to support alignment and downstream optimization. Improve inference efficiency with low-latency serving (vLLM, SGLang). Bachelor's in CS/AI/ML or related; 8+ years industry experience. Strong hands-on experience with post-training pipelines for large models. Deep familiarity with DPO, GRPO, and RL-based post-training methods. Experience training specialized small models from scratch. Solid understanding of RL fundamentals and alignment applications. Experience deploying models in low-latency production (vLLM, SGLang). Competitive total compensation package L&D programs and education subsidy Team building programs and company events Wellness and meal allowances Healthcare schemes for employees and dependants Additional benefits disclosed during the process