Full-Time

Senior Devops Engineer

Posted on 7/23/2024

Together AI

Together AI

51-200 employees

Decentralized cloud services for AI development

Enterprise Software
AI & Machine Learning

Compensation Overview

$160k - $230kAnnually

+ Equity + Benefits

Senior

San Francisco, CA, USA

Category
DevOps & Infrastructure
DevOps Engineering
Required Skills
Packer
Chef
Kubernetes
Python
Puppet
Tensorflow
CUDA
Pytorch
Java
Blockchain
Go
Terraform
Ansible
C/C++
Development Operations (DevOps)
Linux/Unix
Requirements
  • Minimum of 5 years of prior relevant experience in DevOps, cloud computing, data center operations, SRE, and Linux systems administration
  • Experience in programming in at least one of the following languages: Java, Python, Go, C++
  • Experience designing and building advanced CI/CD pipeline frameworks
  • Experience with cloud computing toolsets like Terraform, Vault, and Packer
  • Experience with configuration management tools like Ansible, Pulumi, Chef and Puppet
  • Experience with Kubernetes, containerization and VPNs
  • Strong sense of ownership and desire to build great tools for others
  • Self-driven and motivated, with a strong work ethic and a passion for problem solving
  • Experience with AI workloads and blockchain based protocols a plus
  • GPU programming, NCCL, CUDA knowledge a plus
  • Experience with Pytorch or Tensorflow a plus
Responsibilities
  • Create a highly automated infrastructure pipeline for deploying and scaling distributed and multi-tenant GPU-resident compute to new cloud and data center environments
  • Create infrastructure to auto-scale AI models, create training clusters, and wrestle with CUDA dependencies
  • Introduce tools to facilitate greater automation and operability of services
  • Design, build, and maintain CI/CD infrastructure
  • Architect, deploy, and scale observability infrastructure
  • Participate in on-call rotation and ensure uptime of services
  • Investigate production issues and help prevent their reoccurrence
  • Create runtime tools/processes that optimize cloud triaging and limit downtime
  • Define best practices to make our systems and services measurable
  • Work closely with internal teams to ensure best practices are appropriately applied
  • Build tools to help engineering and research teams measure and improve their velocity
  • Analyze and decompose complex software systems
  • Collaborate with and influence others to improve the overall design

Together AI focuses on enhancing artificial intelligence through open-source contributions. The company offers decentralized cloud services that allow developers and researchers from various organizations to train, fine-tune, and deploy generative AI models. Their services cater to a wide range of clients, including small startups, large enterprises, and academic institutions. Together AI's business model is based on providing cloud-based solutions that support the development and deployment of AI models, generating revenue through service subscriptions and usage fees. The company stands out from its competitors by emphasizing open and transparent AI systems, which fosters innovation and aims to achieve beneficial outcomes for society.

Company Stage

Series A

Total Funding

$222.3M

Headquarters

Menlo Park, California

Founded

2022

Growth & Insights
Headcount

6 month growth

69%

1 year growth

134%

2 year growth

617%
Simplify Jobs

Simplify's Take

What believers are saying

  • Recent $106M funding round boosts Together AI's R&D and growth potential.
  • Partnership with Meta enhances Together AI's access to cutting-edge AI models.
  • FlashAttention-3 development improves AI model efficiency on Nvidia GPUs.

What critics are saying

  • Free access to Meta's Llama 3.2 may impact Together AI's revenue model.
  • Advancements in edge AI could reduce demand for cloud-based solutions.

What makes Together AI unique

  • Together AI focuses on open-source contributions, fostering innovation and collaboration.
  • The company offers decentralized cloud services for AI model training and deployment.
  • Together AI's commitment to transparency sets it apart in the AI industry.

Help us improve and share your feedback! Did you find this helpful?

INACTIVE