Simplify Logo

Full-Time

Distributed ML Systems Engineer-Inference

Confirmed live in the last 24 hours

Together AI

Together AI

51-200 employees

Decentralized cloud services for AI development

Enterprise Software
AI & Machine Learning

Compensation Overview

$160k - $230kAnnually

+ Equity + Benefits

Mid

San Francisco, CA, USA

Category
Backend Engineering
Software Engineering
Required Skills
Kubernetes
Rust
Microsoft Azure
Python
Pytorch
Operating Systems
AWS
Go
C/C++
Google Cloud Platform
Requirements
  • 3+ years of experience in building large-scale, fault-tolerant, high-performance distributed systems.
  • Strong programming skills in one or more of Python, Go, Rust, or C/C++.
  • Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.
  • Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.
  • Strong problem-solving skills and ability to work in a fast-paced environment.
  • Preferred: Experience with Kubernetes
  • Preferred: Experience with Pytorch
Responsibilities
  • Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
  • Develop and optimize distributed processing frameworks and storage systems.
  • Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.
  • Conduct architecture and design reviews to ensure best practices in system design.
  • Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.

Together AI focuses on enhancing artificial intelligence through open-source contributions. The company offers decentralized cloud services that allow developers and researchers from various organizations to train, fine-tune, and deploy generative AI models. Their services cater to a wide range of clients, including small startups, large enterprises, and academic institutions. Together AI's business model is based on providing cloud-based solutions that support the development and deployment of AI models, generating revenue through service subscriptions and usage fees. The company stands out from competitors by emphasizing open and transparent AI systems, which fosters innovation and aims to achieve beneficial outcomes for society.

Company Stage

Series A

Total Funding

$222.3M

Headquarters

Menlo Park, California

Founded

N/A

Growth & Insights
Headcount

6 month growth

58%

1 year growth

128%

2 year growth

612%
Simplify Jobs

Simplify's Take

What believers are saying

  • The $106M funding round led by Salesforce Ventures provides significant capital for growth and innovation.
  • Hiring top talent, such as the head of sales operations from Coinbase, strengthens the company's leadership team.
  • The release of the biological foundational model Evo opens new avenues in biotech, potentially revolutionizing DNA, RNA, and protein sequence analysis.

What critics are saying

  • The competitive landscape in AI hardware optimization is intense, with new entrants like Groq posing potential threats.
  • Dependence on Nvidia's GPUs could be a vulnerability if supply chain issues or technological shifts occur.

What makes Together AI unique

  • Together AI's collaboration with top-tier institutions like Meta, Nvidia, and Princeton University on FlashAttention-3 showcases its cutting-edge research capabilities.
  • The company's focus on optimizing LLMs for Nvidia Hopper GPUs positions it uniquely in the AI hardware optimization space.
  • Together AI's valuation of $1.25 billion and backing from industry giants like Salesforce Ventures and Nvidia highlight its strong market position and investor confidence.

Help us improve and share your feedback! Did you find this helpful?