Simplify Logo

Full-Time

Staff Software Engineer-Reliability

Confirmed live in the last 24 hours

Luma AI

Luma AI

11-50 employees

Advances 3D capture technology for mixed-reality

Data & Analytics
Hardware
AI & Machine Learning

Compensation Overview

$200k - $250kAnnually

Expert

Palo Alto, CA, USA

Category
IT & Support
Security Engineering
Software Engineering
Required Skills
Datadog
Kubernetes
Terraform
Splunk
Requirements
  • Proven work experience 10+ yrs as an reliability engineer, production engineer, infrastructure software engineer or a similar role in a fast-paced, rapidly scaling company.
  • Strong proficiency in GPU cloud infrastructure, including the underlying concepts of scheduling, scaling, cloud storage, networking and security.
  • Proficiency in programming/scripting languages.
  • Experience with containerization technologies and container orchestration platforms like Kubernetes or equivalent.
  • Knowledge of IaC tools such as Terraform or CloudFormation or equivalent.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills.
  • Experience with observability tools; examples include DataDog, Prometheus, Grafana, Splunk and ELK stack or similar.
  • Knowledge of security best practices in cloud environments.
Responsibilities
  • Collaborate with researchers and engineers to specify the availability, performance, correctness, and efficiency requirements of the current and future versions of our GPU infrastructure.
  • Work with multiple GPU cloud providers to scale up, scale down, maintain and monitor our 000's GPUs in many clusters.
  • Design and implement solutions to ensure the scalability of our infrastructure to meet rapidly increasing demands.
  • Implement and manage monitoring systems to proactively identify issues and anomalies in our production environment.
  • Implement fault-tolerant and resilient design patterns to minimize service disruptions.
  • Build and maintain automation tools to streamline repetitive tasks and improve system reliability.
  • Participate in an on-call rotation to respond to critical incidents and ensure 24/7 system availability alongside other infrastructure developers.
  • Develop and maintain service level objectives (SLOs) and service level indicators (SLIs) to measure and ensure system reliability.

Luma specializes in advancing 3D capture and exploration technology, utilizing neural rendering, deep learning, and high-performance computing to enhance the realism of photos and videos for mixed-reality experiences.

Company Stage

Series B

Total Funding

$72.8M

Headquarters

San Francisco, California

Founded

2021

Growth & Insights
Headcount

6 month growth

27%

1 year growth

109%

2 year growth

666%
Simplify Jobs

Simplify's Take

What believers are saying

  • The public beta release of Dream Machine has generated significant user interest, indicating strong market demand and potential for rapid adoption.
  • Luma AI's successful Series B funding round, led by Andreessen Horowitz, provides financial stability and resources for further innovation and growth.
  • The high praise for Dream Machine's video quality and realism suggests a competitive advantage in the AI video generation market.

What critics are saying

  • The high demand for Dream Machine has led to long wait times, which could frustrate users and hinder adoption.
  • Competition from established players like OpenAI and emerging competitors like Runway and Kling could impact Luma AI's market share.

What makes Luma AI unique

  • Luma AI's Dream Machine offers a unique blend of high-quality, realistic, and fantastical video generation from text and images, setting it apart from competitors like OpenAI's Sora and Runway.
  • The collaboration with AWS for top-tier H100 training infrastructure and SageMaker HyperPod enhances Luma AI's technological edge.
  • Luma AI's focus on both 3D model generation and video creation provides a versatile platform that appeals to a broad range of creative and professional users.