Full-Time

Distributed LLM Inference Engineer

Confirmed live in the last 24 hours

Anyscale

Anyscale

201-500 employees

Platform for scaling AI workloads

Enterprise Software
AI & Machine Learning

Compensation Overview

$170.1k - $247kAnnually

Mid, Senior

San Francisco, CA, USA

Onsite position in San Francisco, CA.

Category
Applied Machine Learning
Deep Learning
Natural Language Processing (NLP)
AI & Machine Learning
Required Skills
Tensorflow
CUDA
Pytorch
Requirements
  • Familiarity with running ML inference at large scale with high throughput
  • Familiarity with deep learning and deep learning frameworks (e.g. PyTorch)
  • Solid understanding of distributed systems, ML inference challenges
  • ML Systems knowledge
  • Experience using Ray Data
  • Work closely with community on LLM engines like vLLM, TensorRT-LLM
  • Contributions to deep learning frameworks (PyTorch, TensorFlow)
  • Contributions to deep learning compilers (Triton, TVM, MLIR)
  • Prior experience working on GPUs / CUDA
Responsibilities
  • Iterate very quickly with product teams to ship the end to end solutions for Batch and Online inference at high scale which will be used by Customers of Anyscale
  • Work across the stack integrating Ray Data and LLM engine providing optimizations across the stack to provide low cost solutions for large scale ML inference.
  • Integrate with Open source software like VLLM, work closely with the community to adopt these techniques in Anyscale solutions, and also contribute improvements to open source.
  • Follow the latest state-of-the-art in the open source and the research community, implementing and extending best practices

Anyscale provides a platform designed to scale and productionize artificial intelligence (AI) and machine learning (ML) workloads. Its main product, Ray, is an open-source framework that helps users efficiently scale AI applications across various fields, including Generative AI, Large Language Models (LLMs), and computer vision. Companies like OpenAI and Ant Group utilize Ray to train their largest models, enhancing the performance and reliability of their ML platforms. Anyscale's platform improves scalability, latency, and cost-efficiency, with some clients experiencing over 90% improvements in these areas. The company operates on a software-as-a-service (SaaS) model, allowing clients to subscribe to access Ray and its features, which generates a consistent revenue stream. Anyscale aims to be a key player in the AI and ML market by providing essential tools that help organizations optimize their AI workloads.

Company Stage

Series C

Total Funding

$252.5M

Headquarters

San Francisco, California

Founded

2019

Growth & Insights
Headcount

6 month growth

165%

1 year growth

67%

2 year growth

242%
Simplify Jobs

Simplify's Take

What believers are saying

  • Anyscale's $100M Series C funding indicates strong investor confidence and growth potential.
  • Partnership with Nvidia enhances performance and cost-efficiency for AI deployments.
  • Anyscale Endpoints offers 10X cost-efficiency for popular open-source LLMs.

What critics are saying

  • ShadowRay vulnerability in Ray framework poses significant security risk with no patch.
  • OctoML's OctoAI service increases competition in AI infrastructure market.
  • Dependency on Nvidia's technology could be risky if Nvidia faces issues.

What makes Anyscale unique

  • Anyscale's Ray framework scales AI applications from laptops to cloud seamlessly.
  • Ray is widely used in Generative AI, LLMs, and computer vision fields.
  • Anyscale's SaaS model provides recurring revenue through subscription fees for Ray platform.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Medical, Dental, and Vision insurance

401K retirement savings

Flexible time off

FSA and Commuter benefits

Parental and family leave

Office & phone plan reimbursement