Full-Time

Senior SDET

AI Cluster Networking and Security

Confirmed live in the last 24 hours

Cerebras

Cerebras

201-500 employees

Develops AI acceleration hardware and software

No salary listed

Expert

Toronto, ON, Canada + 2 more

More locations: Bengaluru, Karnataka, India | Sunnyvale, CA, USA

Category
Applied Machine Learning
Robotics & Autonomous Systems
AI & Machine Learning
Required Skills
Kubernetes
Python
Grafana
Machine Learning
Go
Prometheus
C/C++
Requirements
  • Bachelor's or master's degree in engineering in computer science, electrical, AI, data science of related fields.
  • 10+ years of experience in testing one of areas like enterprise software, distributed systems, datacenter hardware and software.
  • Strong coding skills in one of the programming languages like python, golang or C/C++.
  • Strong debugging skills to debug issues in large distributed systems, hardware, and software. Experience with debugging tools like gdb, strace, networking monitors.
  • Strong understanding of operating systems internals like memory management, file system working, security basics and performance.
  • Strong understanding of datacenter layout, device performance characteristics like PCIe, networking and storage.
Responsibilities
  • Innovate and execute tests on cutting edge AI infrastructure. Be a thinker, define optimized test strategies and methodologies.
  • Adapt to new technologies and bring expertise to a rapidly growing and innovating ML community and AI models.
  • Build a strong understanding of how to break large distributed systems challenges into smaller components that can be unit tested.
  • Aim for 100% automated tests to test all cluster features in areas of high availability, failure scenarios, performance, stress and security.
  • Champion cluster security, reliability for uptime of 99.9999% and ease of use with observability.
  • Test all components of AI cluster including but not limited to cluster software involving kubernetes, prometheus and grafana. Cluster hardware components like ML wafer scale accelerators, CPU runtime nodes, High speed swarmx interconnect, High speed data transfer of weights through memoryx interconnect.
  • Qualify cluster networking solutions which consists of high-speed switches, routers and optics from various vendors.
  • Qualify cluster security features including OS security, network security, cloud compliance user access and security certifications.
Desired Qualifications
  • Experience with cloud technologies like AWS, kubernetes and dockers. Monitoring tools like grafana, prometheus is huge plus.
  • Understanding and experience of ML model training and inference is a huge plus.
  • Understand of ML hardware accelerators like GPU, custom accelerator ASIC is a huge plus.

Cerebras Systems specializes in accelerating artificial intelligence (AI) through its CS-2 system, which is recognized as the fastest AI accelerator available. This system is designed to replace traditional clusters of graphics processing units (GPUs) used in AI computations, simplifying the process by eliminating the need for complex parallel programming and cluster management. Cerebras serves a variety of clients, including major pharmaceutical companies and government research labs, providing them with faster results for critical applications like cancer drug response predictions. The company operates in the high-performance computing and AI markets, generating revenue by selling its proprietary hardware and software solutions, including the CS-2 systems and related cloud services. Cerebras aims to enhance the efficiency of AI research and development, enabling clients to achieve quicker results and lower costs.

Company Size

201-500

Company Stage

Series F

Total Funding

$720M

Headquarters

Sunnyvale, California

Founded

2016

Simplify Jobs

Simplify's Take

What believers are saying

  • Growing AI model efficiency demand aligns with Cerebras' energy-efficient accelerators.
  • AI democratization increases need for user-friendly systems like Cerebras' CS-2.
  • Pharmaceutical industry's push for faster drug discovery boosts demand for Cerebras' technology.

What critics are saying

  • Competition from NVIDIA and Graphcore could impact Cerebras' market share.
  • Rapid AI model evolution may necessitate frequent hardware updates, increasing R&D costs.
  • Supply chain vulnerabilities could delay production of Cerebras' hardware.

What makes Cerebras unique

  • Cerebras' Wafer-Scale Engine is the largest chip ever built for AI.
  • The CS-2 system replaces traditional GPU clusters, simplifying AI computations.
  • Cerebras serves diverse industries, including pharmaceuticals and government research labs.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Professional Development Budget

Flexible Work Hours

Remote Work Options

401(k) Company Match

401(k) Retirement Plan

Mental Health Support

Wellness Program

Paid Sick Leave

Paid Holidays

Paid Vacation

Parental Leave

Family Planning Benefits

Fertility Treatment Support

Adoption Assistance

Childcare Support

Elder Care Support

Pet Insurance

Bereavement Leave

Employee Discounts

Company Social Events