Staff Software Engineer
MLI Compute
Posted on 3/30/2023
San Francisco, CA, USA
Experience Level
Desired Skills
Data Analysis
Google Cloud Platform
Microsoft Azure
  • 7+ years of industry experience
  • 4+ years of experience in distributed systems and large-scale compute infrastructure
  • Experience working with relational and NoSQL databases
  • BS, MS or Ph.D. in Computer Science, Electrical Engineering, Mathematics, Physics, or another relevant field; or equivalent real-world experience
  • Passionate about self-driving technology and its potential impact on the world
  • Attention to detail and a passion for truth
  • A track record of efficiently solving complex problems
  • Startup mentality - openness to dealing with unknown unknowns and wearing many hats
  • Use the latest cloud technologies to own, design, implement and test scalable distributed compute and data processing in the cloud
  • Build high-performance scheduling/deployment solutions for ML workloads, efficiently handling both long and short-running jobs
  • Champion engineering excellence and focus on observability of our infrastructure by continuously improving systems and processes
  • Own technical projects from start to finish, contribute to the team's product roadmap and be responsible for major technical decisions and tradeoffs. Effectively participate in team's planning, code reviews, and design discussions
  • Consider the effects of projects across multiple teams and proactively manage conflicts. Work together with peers, partner teams and orgs to achieve cross-departmental goals and satisfy broad requirements. Provide technology leadership beyond your immediate team
  • Conduct technical interviews with well-calibrated standards and play an essential role in recruiting activities. Effectively onboard and mentor junior engineers and/or interns
  • Help up-level and coach a team of talented distributed systems engineers
Desired Qualifications
  • Experience with Google Cloud Platform, Microsoft Azure, or Amazon Web Services
  • Experience building production service using Golang and proficient in Golang
  • Experience with using Kubernetes, and particularly building Kubernetes-native solutions
  • Experience in distributed job scheduling problem space. Familiarity with scheduling systems like Volcano, Yunikorn, YARN, Mesos is a plus
  • Experience with PyTorch distributed training and Ray framework. Understanding of typical Machine Learning (ML) development life cycle

1,001-5,000 employees

Self-driving car service
Company Overview
Cruise's is building self-driving vehicles to improve life in our cities. The company makes autonomous, sustainable, self-driving EVs.
  • Flexible vacation
  • Paid holidays
  • Paid parental leave
  • Fertility & family expansion benefits
  • 401k matching program
  • Monthly social events
  • Community volunteering programs
  • Healthy meals & snacks for onsite employees
  • Quarterly offsites & working retreats
  • Monthly wellness stipend
  • Mental health support
  • Professional development programs
  • On-site gym in SF HQ
  • Commuter benefits for onsite employees
  • Medical, dental & vision coverage
Company Core Values
  • Stay safe
  • Stay focused
  • Own it
  • Seek truth
  • Work together
  • Be humble