GenAI Infrastructure Architect

New

Skills

Distributed Systems Docker GenAI Infrastructure Go GPU Virtualization Kubernetes LLM Optimization Machine Learning MLOps Python

As a Staff Machine Learning Engineer focused on the GenAI Platform, you will drive the architecture and strategy for our next-generation LLM platform, enhancing our capabilities to support large-scale foundation models for millions of users.

Key Responsibilities
  • Propose, design, and lead the architecture of a scalable LLM platform.
  • Architect fault-tolerant training infrastructure for distributed workloads across GPU clusters.
  • Develop self-serve LLM workflows for fine-tuning and model lifecycle management.
  • Build comprehensive evaluation and benchmarking infrastructure to ensure model quality and safety.
  • Extend data ingestion pipelines to handle multimodal datasets efficiently.
  • Provide technical leadership and mentorship to senior engineers and collaborate with cross-functional teams.
Required Skills & Qualifications
  • 10+ years of experience in software development and distributed data systems.
  • Expertise in GenAI/LLM infrastructure and large-scale ML systems.
  • Hands-on experience with distributed training frameworks and LLM optimization.
  • Mastery of managing fault-tolerant, petabyte-scale distributed systems.
  • Deep knowledge of MLOps, ML orchestration, and model evaluation methodologies.
  • Proficient in Kubernetes, Docker, and building production-quality code in Python/Go.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: Months

Share this job:

Similar Jobs

Software Engineer, Rider Pay

Posted 8 days ago

Write maintainable and testable code.

Participate in code reviews.

Automation AWS Code Review Debugging

Application Infrastructure Engineer

Posted 8 days ago

Architect ultra-reliable services for high performance.

Identify and resolve infrastructure bottlenecks.

AI Tools Asynchronous Messaging Cloud Computing Collaboration Tools

Software Architect Role

Posted 6 days ago

Modernize distributed architecture using DDD.

Create scalable and loosely coupled teams.

Cloud Architecture Data Consistency Distributed Systems Domain-Driven Design (DDD)

Principal Software Engineer

Posted 6 days ago

Design scalable and secure systems.

Lead technical solutions and influence organization.

AWS Backend Systems Collaboration Distributed Systems

Security Software Engineer

Posted 6 days ago

Design security infrastructure for Starlink.

Develop security features for Starlink systems.

C++ Cross-Functional Collaboration Distributed Systems Golang

Senior Backend Engineer

Posted 6 days ago

Develop and maintain backend systems for promotional features.

Optimize existing systems for performance and scalability.

Async Programming Distributed Systems Kafka Kubernetes

Senior Product Manager

Posted 6 days ago

Lead the strategy for the configuration platform.

Design systems for safe data management.

AI-assisted Workflows APIs Automation Data Models

Backend Software Engineer

Posted 6 days ago

Design and expand the platform architecture.

Align project scopes with cross-team collaboration.

API Design AWS Backend Development Distributed Systems

Senior Backend Engineer

Posted 6 days ago

Design and scale backend systems for global merchants.

Collaborate with cross-functional teams on project scopes.

API Design AWS Backend Software Development Database Design

Software Engineering Manager

Posted 6 days ago

Lead the design and development of web applications.

Provide technical leadership and mentorship to engineering teams.

Agile/Scrum Methodologies AWS (compute storage networking)

Software Engineer, Marketing Tech

Posted 6 days ago

Build high-performance web systems.

Implement distributed caching strategies.

AI Tools in Development Architecture Reviews Caching Strategies Distributed Systems

Senior Software Engineer

Posted 5 days ago

Align middleware and safety design requirements.

Maintain an architecture view of the middleware stack.

Architecture Design Cross-Functional Collaboration Diagnostics Systems Distributed Systems

AI Security Engineer

Posted 5 days ago

Build a low-latency AI security plane.

Develop distributed, scalable systems.

async-std Distributed Systems gRPC LLM

Engineering Director, Product Platforms

Posted 5 days ago

Lead platform teams to enhance Aerospike Cloud.

Develop platforms for cloud product functionality.

Auto-scaling Backend Development Cloud Computing Distributed Systems

Health Data Platform Manager

New

Define long-term strategy for health data platform.

Collaborate on data ingestion and processing.

AI Pipelines APIs Cloud Architectures Data Governance

Senior Software Engineer

New

Design and develop data platform features.

Build distributed data pipelines in Go.

Algorithms Automation Clean Coding Practices Data Architecture

Security Software Engineer

New

Design and build security infrastructure for Starshield.

Develop security features for various Starshield systems.

C++ Cryptographic Services Distributed Systems Golang

Backend Software Engineer

New

Build backend systems for enterprise controls.

Design region-aware architectures for compliance.

Data Governance Distributed Systems Go Performance Optimization

Workload Enablement Engineer

New

Port workloads to new platforms while ensuring stability.

Build benchmarks and stress tests for various hardware.

C++/CUDA/HIP Distributed Systems High-Performance Computing (HPC) Large-Scale Distributed Training

Senior DevOps Engineer

New

Build automation tools for resource delivery.

Collaborate with engineering teams for quality product delivery.

Automation Tools Cloud Infrastructure Containerization DevOps

Security Software Engineer

New

Design security-critical agents for network defense.

Leverage AI for security issue resolution.

AI/ML Automation C++ Cryptography

Security Software Engineer

New

Automate network and systems defense.

Leverage AI for security issue remediation.

AI Integration Automation C++ Distributed Systems

Staff Software Engineer

New

Architect and build scalable web systems.

Design distributed caching strategies.

AI Tools Caching Strategies Distributed Systems Frontend Development

Privacy and Identity Engineer

New

Build privacy-first distributed data systems.

Design high-throughput event-driven architectures.

Automation Tools Data Pipelines Distributed Systems Event-Driven Architecture

Principal Software Engineer

New

Lead scalable system architecture.

Drive technical strategy and patterns.

AWS Distributed Systems Google Cloud Microservices

Senior Software Engineer Role

New

Take ownership of complex initiatives.

Drive architectural decisions for platforms.

API Development Architectural Design Cloud-Native Technologies Developer Tooling

Software Engineering Manager

New

Lead the design and delivery of web applications.

Provide technical leadership and mentorship to engineering teams.

Agile/Scrum AWS (Compute Storage Networking)

Software Engineer - Marketing Tech

New

Architect and build scalable web systems.

Design distributed caching strategies.

AI Tools Caching Strategies Distributed Systems Full-Stack Engineering

Backend Software Engineer

New

Design and implement APIs for Model Serving.

Optimize performance of CPU/GPU systems.

Algorithms Autoscaling Containers CPU/GPU Optimization

Engineering Manager Role

New

Plan and manage software projects.

Align team goals with business objectives.

Agile Methodologies Backend Development Distributed Systems Mobile UI Development

Staff Software Engineer

New

Architect and build scalable web systems.

Implement distributed caching strategies.

AI Tools in Development Caching Strategies Distributed Systems Full-stack Engineering

Senior Software Engineer

New

Lead backend architecture and system design.

Own backend roadmap and drive technical decisions.

AI Coding Tools Backend Architecture Cloud-native Architecture Distributed Systems

Senior Data Engineer

New

Build and maintain core data platform components.

Create production-grade data systems in cloud environments.

Cloud Data Platforms Data Governance Data Modeling Distributed Systems

Simulation Software Engineer

New

Develop high-fidelity simulation software for SpaceX vehicles.

Prototype and validate design concepts.

C++ CPU Scheduling Debugging Distributed Systems

Staff Architect Role

New

Lead architecture for Revenue Intelligence domain.

Deliver reference implementations and prototypes.

AWS Azure ClickHouse Data Pipelines

Senior Software Engineer

New

Design and build core components of the system.

Shape architecture and development practices.

Agile Methodologies Cloud Technologies Database Internals Distributed Systems

Software Engineer, Marketing Tech

New

Architect and build scalable web systems.

Design distributed caching strategies.

AI Tools Caching Strategies Distributed Systems Full-Stack Development

Senior Software Engineer

New

Lead backend architecture and design.

Propose new systems for customer features.

AI-Assisted Development Tools Backend Development Cloud-Native Architecture Cross-Functional Collaboration

Software Engineering Manager

New

Lead the design and development of web applications.

Provide technical leadership and mentorship to engineering teams.

Agile/Scrum AWS Cloud-Native Design DevOps Practices

Engineering Director Remote

New

Lead engineering vision and goals.

Manage multiple engineering teams.

Architecture Design AWS Distributed Systems Engineering Leadership

Software Engineer, Marketing Tech

New

Architect and build scalable web systems.

Design and implement caching strategies.

AI Tools Caching Strategies Distributed Systems High-Scale Web Systems

Software Engineering Manager

New

Lead the design and development of web applications.

Provide technical leadership through reviews and architecture decisions.

Agile/Scrum AWS (Compute Storage Networking)

Staff AI Backend Engineer

Posted 17 days ago

Architect and maintain high-performance backend infrastructure

Integrate advanced AI and LLM technologies into data workflows

AWS Azure Docker Engineer

Staff Software Engineer Role

Posted 17 days ago

Lead backend development for AI-powered data solutions

Architect scalable systems and APIs for enterprise clients

Agile Methodologies Angular AWS Azure

Staff Software Engineer, AI Backend

Posted 17 days ago

Design and build scalable AI-driven backend systems

Integrate advanced language models into data workflows

AWS Azure Docker FastAPI

Staff Software Engineer

Posted 17 days ago

Revolutionize enterprise data operations through AI solutions.

Automate and accelerate data tasks for overworked data teams.

Ai Airflow Ansible Api Development

AI-Powered Data Operations Revolution

Posted 17 days ago

Revolutionize enterprise data operations through AI automation.

Develop high-performance backend systems for AI solutions.

Docker Engineer FastAPI Python

Anaplan Full Stack Engineer

Posted 17 days ago

Build exceptional software for a global growth strategy

Deliver seamless user experiences through architecture and collaboration

Css Docker Engineer Front end

Angular Frontend Technical Lead

Posted 17 days ago

Lead technical delivery of Angular frontend

Ensure compliance with financial regulations

Angular Ci/cd Pipelines Css3 Docker

Angular Frontend Tech Lead

Posted 17 days ago

Lead frontend delivery for a banking risk management tool

Ensure technical excellence, scalability, and compliance

Angular CI/CD Docker Git