Full-Time

Distributed Systems Engineer

AI Inference Platform

Confirmed live in the last 24 hours

Cerebras

Cerebras

201-500 employees

Develops AI acceleration hardware and software

No salary listed

Senior

Toronto, ON, Canada + 1 more

More locations: Sunnyvale, CA, USA

Category
Applied Machine Learning
Robotics & Autonomous Systems
AI & Machine Learning
Required Skills
Kubernetes
Python
Microservices
C/C++
Requirements
  • Bachelor’s or master's degree in computer science or related field, or equivalent practical experience.
  • 5+ years of software engineering experience, with a strong focus on distributed systems architecture and optimization.
  • Deep understanding of distributed systems principles.
  • Proven experience with container orchestration technologies, particularly Kubernetes (K8s).
  • Strong programming skills in Python. C++ experience is a plus.
  • Experience with distributed messaging systems or RPC frameworks.
  • Experience designing for high availability, fault tolerance, and scalability.
  • Strong debugging and performance analysis skills in distributed environments.
  • Familiarity with cloud-native technologies and microservices architectures.
Responsibilities
  • Design, build, and operate foundational distributed systems components that power the Inference Platform with high availability, scalability, and performance.
  • Architect and implement the core logic for distributed request routing, dynamic load balancing, replica synchronization, and distributed metadata management.
  • Develop and enhance the fault tolerance and auto-recovery mechanisms for platform services and inference replicas.
  • Optimize communication patterns and data flow between microservices to ensure minimal latency and maximal throughput at scale.
  • Contribute to the design and implementation of the distributed orchestration and scheduling system for managing inference workloads and resources.
  • Implement and refine monitoring, tracing, and alerting for distributed system components to ensure operational excellence.
  • Collaborate closely with hardware, ML, and other software teams to ensure seamless integration and end-to-end system performance.
  • Debug complex issues spanning multiple services and systems in a distributed environment.

Cerebras Systems specializes in accelerating artificial intelligence (AI) through its CS-2 system, which is recognized as the fastest AI accelerator available. This system is designed to replace traditional clusters of graphics processing units (GPUs) used in AI computations, simplifying the process by eliminating the need for complex parallel programming and cluster management. Cerebras serves a variety of clients, including major pharmaceutical companies and government research labs, providing them with faster results for critical applications like cancer drug response predictions. The company operates in the high-performance computing and AI markets, generating revenue by selling its proprietary hardware and software solutions, including the CS-2 systems and related cloud services. Cerebras aims to enhance the efficiency of AI research and development, enabling clients to achieve quicker results and lower costs.

Company Size

201-500

Company Stage

Series F

Total Funding

$720M

Headquarters

Sunnyvale, California

Founded

2016

Simplify Jobs

Simplify's Take

What believers are saying

  • Growing AI model efficiency demand aligns with Cerebras' energy-efficient accelerators.
  • AI democratization increases need for user-friendly systems like Cerebras' CS-2.
  • Pharmaceutical industry's push for faster drug discovery boosts demand for Cerebras' technology.

What critics are saying

  • Competition from NVIDIA and Graphcore could impact Cerebras' market share.
  • Rapid AI model evolution may necessitate frequent hardware updates, increasing R&D costs.
  • Supply chain vulnerabilities could delay production of Cerebras' hardware.

What makes Cerebras unique

  • Cerebras' Wafer-Scale Engine is the largest chip ever built for AI.
  • The CS-2 system replaces traditional GPU clusters, simplifying AI computations.
  • Cerebras serves diverse industries, including pharmaceuticals and government research labs.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Professional Development Budget

Flexible Work Hours

Remote Work Options

401(k) Company Match

401(k) Retirement Plan

Mental Health Support

Wellness Program

Paid Sick Leave

Paid Holidays

Paid Vacation

Parental Leave

Family Planning Benefits

Fertility Treatment Support

Adoption Assistance

Childcare Support

Elder Care Support

Pet Insurance

Bereavement Leave

Employee Discounts

Company Social Events