Full-Time

Distributed ML Systems Engineer-Inference

Confirmed live in the last 24 hours

Together AI

Together AI

51-200 employees

Decentralized cloud services for AI development

Enterprise Software
AI & Machine Learning

Compensation Overview

$160k - $230kAnnually

+ Equity + Benefits

Mid, Senior

San Francisco, CA, USA

Category
Applied Machine Learning
AI Research
AI & Machine Learning
Required Skills
Kubernetes
Rust
Microsoft Azure
Python
Pytorch
Operating Systems
AWS
Go
C/C++
Google Cloud Platform
Requirements
  • 3+ years of experience in building large-scale, fault-tolerant, high-performance distributed systems.
  • Strong programming skills in one or more of Python, Go, Rust, or C/C++.
  • Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, and storage, performance, and scale.
  • Experience with cloud computing platforms (AWS, GCP, Azure etc.) and large-scale infrastructure.
  • Strong problem-solving skills and ability to work in a fast-paced environment.
  • Preferred: Experience with Kubernetes
  • Preferred: Experience with Pytorch
Responsibilities
  • Design and build large-scale, distributed machine learning systems that are fault-tolerant and high-performance.
  • Develop and optimize distributed processing frameworks and storage systems.
  • Collaborate with researchers, engineers, and product managers to integrate ML systems into our infrastructure.
  • Conduct architecture and design reviews to ensure best practices in system design.
  • Implement robust monitoring and logging systems to ensure the health and performance of our ML systems.

Together AI focuses on enhancing artificial intelligence through open-source contributions and decentralized cloud services. The company enables developers and researchers to train, fine-tune, and deploy generative AI models, catering to a diverse clientele that includes startups, large enterprises, and academic institutions. Its cloud-based solutions allow users to develop and implement AI models efficiently, with revenue generated from service subscriptions and usage fees. Together AI distinguishes itself by prioritizing open and transparent AI systems, aiming to foster innovation and achieve beneficial outcomes for society.

Company Stage

Series A

Total Funding

$222.3M

Headquarters

Menlo Park, California

Founded

2022

Growth & Insights
Headcount

6 month growth

29%

1 year growth

134%

2 year growth

617%
Simplify Jobs

Simplify's Take

What believers are saying

  • Leveraging Meta's Llama 3.2 Vision model democratizes access to advanced AI capabilities.
  • FlashAttention-3 development optimizes AI model efficiency, reducing operational costs for Together AI.
  • Strategic investments in AI startups could lead to collaborative opportunities and technological advancements.

What critics are saying

  • Increased competition from AI startups like jhana.ai could divert potential clients.
  • Dependency on Meta's technology may influence Together AI's strategic direction.
  • Stricter AI regulations could impose additional compliance costs and operational challenges.

What makes Together AI unique

  • Together AI focuses on open-source contributions, setting it apart in the AI industry.
  • The acquisition of CodeSandbox enhances Together AI's platform with code interpretation capabilities.
  • Together AI's decentralized cloud services empower diverse organizations to deploy AI models efficiently.

Help us improve and share your feedback! Did you find this helpful?