Full-Time

Cluster Deployment Operations Engineer

Confirmed live in the last 24 hours

Cerebras

Cerebras

201-500 employees

Develops AI acceleration hardware and software

No salary listed

Senior

Toronto, ON, Canada + 1 more

More locations: Sunnyvale, CA, USA

Category
DevOps & Infrastructure
DevOps Engineering
Required Skills
Python
Go
Linux/Unix
Requirements
  • Proficiency in scripting and practical coding, particularly in Shell and Python (Go is a plus)
  • Strong experience troubleshooting, analyzing, and administering large-scale, distributed systems
  • 5+ years of experience in data center operations and Linux system administration
  • Knowledge and hands-on experience with network configuration and operations
  • Expertise in hardware operations including networking components (e.g., cabling, switches, routers)
Responsibilities
  • Plan and execute cluster deployments, from small-scale to massive distributed systems
  • Manage hands-on aspects of the deployments, coordinating with data center staff for hardware configurations and necessary maintenance
  • Troubleshoot issues related to networking (e.g., BGP, cluster creation hurdles, or cabling errors) and hardware (e.g., hardware DOA)
  • Monitor and maintain systems to ensure uptime, performance, and reliability
  • Collaborate with cross-functional teams including hardware vendors, data center operations, and network engineers to manage the entire lifecycle of deployment
  • Ensure comprehensive documentation is created and maintained for deployments, configurations, and operational processes
  • Develop tools, scripts, or playbooks to automate routine tasks and deployment processes
Desired Qualifications
  • Experience with Kubernetes and the Prometheus monitoring stack
  • Experience with CI/CD tools (e.g., Git, Jenkins, etc.)
  • Familiarity with BGP and other networking protocols, including troubleshooting at Layer 1,2,3
  • Experience with automation tools for deployments, monitoring, and operational efficiency (such as creating playbooks or automated scripts)

Cerebras Systems specializes in accelerating artificial intelligence (AI) processes with its CS-2 system, which is designed to replace traditional clusters of graphics processing units (GPUs) used in AI computations. The CS-2 system simplifies the complexities of parallel programming, distributed training, and cluster management, making AI tasks more efficient. Clients from various sectors, including pharmaceuticals, government research labs, healthcare, finance, and energy, benefit from the system's ability to deliver faster results, which is essential for critical applications like cancer drug response predictions. Cerebras generates revenue by selling its proprietary hardware and software solutions, including the CS-2 systems and related cloud services. The company's goal is to provide a comprehensive solution that enables clients to achieve quicker AI training and lower latency in AI inference, ultimately reducing the costs associated with AI research and development.

Company Size

201-500

Company Stage

Series F

Total Funding

$700.4M

Headquarters

Sunnyvale, California

Founded

2016

Simplify Jobs

Simplify's Take

What believers are saying

  • Growing AI model efficiency demand aligns with Cerebras' energy-efficient accelerators.
  • AI democratization increases need for user-friendly systems like Cerebras' CS-2.
  • Pharmaceutical industry's push for faster drug discovery boosts demand for Cerebras' technology.

What critics are saying

  • Competition from NVIDIA and Graphcore could impact Cerebras' market share.
  • Rapid AI model evolution may necessitate frequent hardware updates, increasing R&D costs.
  • Supply chain vulnerabilities could delay production of Cerebras' hardware.

What makes Cerebras unique

  • Cerebras' Wafer-Scale Engine is the largest chip ever built for AI.
  • The CS-2 system replaces traditional GPU clusters, simplifying AI computations.
  • Cerebras serves diverse industries, including pharmaceuticals and government research labs.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Professional Development Budget

Flexible Work Hours

Remote Work Options

401(k) Company Match

401(k) Retirement Plan

Mental Health Support

Wellness Program

Paid Sick Leave

Paid Holidays

Paid Vacation

Parental Leave

Family Planning Benefits

Fertility Treatment Support

Adoption Assistance

Childcare Support

Elder Care Support

Pet Insurance

Bereavement Leave

Employee Discounts

Company Social Events