Full-Time

Staff Engineer

Fleet Performance

Updated on 12/20/2024

DigitalOcean

DigitalOcean

1,001-5,000 employees

Cloud computing platform for developers

Data & Analytics
Enterprise Software

Compensation Overview

$230k - $270kAnnually

+ Bonus + Equity Compensation

Senior, Expert

Remote in USA

This is a remote role.

Category
Applied Machine Learning
AI Research
AI & Machine Learning
Required Skills
Python
Ruby
Go
Linux/Unix
Requirements
  • Bachelor's or Master's degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
  • Extensive knowledge of Linux kernel, hypervisors, and open-source operating systems
  • 7+ experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, and NCCL
  • 5+ years developing strategies for managing, monitoring, and analyzing infrastructure, applications and services
  • Strong proficiency in Go, Python, and/or Ruby
  • Deep understanding of kernel performance aspects, including scheduling, context switching, and hardware acceleration
  • Expertise in distributed systems performance, including tracing and debugging methodologies
  • Knowledge of GPU technology, GPU fabrics, and programming for multi-GPU workloads
  • Demonstrated ability to solve complex problems at scale
  • Strong security mindset with proactive approach to implementing best practices
  • Excellent cross-team collaboration and communication skills
  • Leadership experience in skills development and mentorship
  • Professional-level written and spoken English with strong presentation abilities
Responsibilities
  • Develop and implement comprehensive performance metrics, analysis tools, and reporting systems
  • Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
  • Collaborate with hardware engineering teams and vendors to continuously validate GPU fabric performance
  • Engage with the open-source Linux community to advance virtualization technologies and integrate them into our fleet
  • Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, and network stack to devise optimization strategies
  • Identify system bottlenecks proactively and drive optimizations across the hypervisor software stack
  • Work cross-functionally to harness new performance capabilities from evolving hardware architectures
  • Enhance test frameworks, harnesses, and pipelines to ensure robust performance validation
  • Investigate and resolve virtual machine downtime and performance issues in our production environment
  • Participate in on-call rotations as needed to support system reliability

DigitalOcean provides cloud computing services that enable developers and businesses to build, deploy, and scale applications efficiently. Its platform offers a range of fully managed services that simplify the process of managing infrastructure, allowing users to focus on software development. DigitalOcean stands out from competitors by emphasizing simplicity, a strong community, and open-source support, which helps users quickly get started and find solutions. The company's goal is to empower developers and small to medium-sized businesses to innovate and grow by reducing the time spent on infrastructure management.

Company Stage

IPO

Total Funding

$168.5M

Headquarters

New York City, New York

Founded

2012

Growth & Insights
Headcount

6 month growth

8%

1 year growth

14%

2 year growth

29%
Simplify Jobs

Simplify's Take

What believers are saying

  • Growing demand for cloud-native tools aligns with DigitalOcean's offerings.
  • Rising Kubernetes adoption benefits DigitalOcean's Kubernetes services.
  • Expansion of the global developer community increases DigitalOcean's customer base.

What critics are saying

  • Increased competition from Vultr with a matching $3.5 billion valuation.
  • Potential over-reliance on Ceph for storage solutions poses risks.
  • Operational complexity may rise with new features like Droplet Autoscale Pools.

What makes DigitalOcean unique

  • DigitalOcean offers a no DevOps required experience for developers.
  • The company focuses on simplicity and open source to attract developers.
  • DigitalOcean's mission-critical infrastructure supports rapid application deployment and scaling.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Remote-first

Full health coverage

Wellness coverage

Flexible vacation time

Team-building & social events

401(k) plans

ESPP

Education support

Partner support

Employee giving