Confirmed live in the last 24 hours
San Francisco, CA, USA
- Experience managing a large cluster, containers + GPUs in cloud environments, and high performance filesystems (e.g., Lustre)
- Willingness to manage and monitor infrastructure deployments
- Exposure to various high-performance networking technologies including MPI, NCCL, RDMA, Infiniband and RoCE
- Proficiency with bash and Python
- You will be responsible for maintaining our cluster -- ensuring that it is easy for researchers to launch jobs, collect results, analyze experiments and create datasets
- Your responsibilities will also include ensuring that jobs run reliably and that hardware failures are proactively diagnosed
- You will work closely with our compute partners to resolve any problems that occur in a timely manner to maintain overall cluster uptime
- You may be called on to debug diverse problems with unclear root causes that could involve networking, version mismatches, container issues, and other performance problems
- Machine learning experience
- Slurm and/or Kubernetes experience
General AI Systems software
Adept AI Labs’ mission is for machines to work together with people in the driver’s seat: discovering new solutions, enabling more informed decisions, and giving people more time for the work they love. The company is committed to advancing ML research with their product lab that is building general intelligence to enable humans and computers to work together creatively.
- Comprehensive health insurance coverage
- Unlimited vacation time
- Competitive salary
- Stock options
- Daily meals + comfortable SF office
- Dog friendly
- First principles thinkers: You like challenging the status quo and thinking from the ground up to innovate and develop new ideas.
- Problem Solvers: You are excited about solving challenging problems. You thrive in ambiguity and push hard to find solutions.
- Voracious learners: You have demonstrated moments of brilliance in the past, and keep striving to learn every day.
- Go-getters: You care a lot about getting things done. You thrive in ambiguity and push hard to find solutions.
- Strong collaborators and communicators: You believe people can solve the hardest problems when they work together.