Simplify Logo


Distributed Software Engineer

Posted on 6/18/2024



201-500 employees

AI computing hardware with largest chip

AI & Machine Learning

Senior, Expert

Toronto, ON, Canada + 2 more

Backend Engineering
Security Engineering
Software Engineering
Required Skills
  • Strong track record of software architecture, system design and development for over 6 years or more
  • Strong track record of development in distributed cluster environment
  • Strong understanding of Kubernetes (K8s) software ecosystem, Prometheus and Grafana
  • Strong development skills in GoLang, Python, bash
  • Strong debugging skills with distributed systems
  • Strong skill to develop tests for the new features and regress old features
  • Automate bare-metal configuration of networking, OS, and application software in large clusters of Cerebras WSE, servers, and switches
  • Additional push button workflows for cluster upgrades, downgrades, and security patching with key metrics to minimize downtime on clusters
  • An orchestration and scheduler system for resource allocation, job submission & placements for a multi-user environment on a cluster
  • Seamless support for both on-premise and cloud mode deployment and operations
  • A robust system for monitoring, detecting and handling failures for a variety of resources on the clusters (including High Availability of clusters)
  • Broad cluster and job monitoring and visualization capabilities, along with alerting systems
  • User facing tools to monitor the status of jobs and collect metrics
  • Administrator facing tools to manage and operate large clusters

Cerebras Systems specializes in developing large-scale, powerful artificial intelligence computers, specifically the CS-2 powered by the Cerebras Wafer Scale Engine. This technology features a record-setting 2.6 trillion transistors, which vastly accelerates the rate of neural network training from typical durations of months to mere minutes. A career at Cerebras Systems represents an opportunity to engage with a team focused on pushing the boundaries of AI hardware, contributing to a culture that thrives on technical excellence and rapid innovation in AI computation, which underscores its position at the cutting edge of AI research and development initiatives.

Company Stage

Series F

Total Funding



Sunnyvale, California



Growth & Insights

6 month growth


1 year growth


2 year growth