Full-Time

Staff Engineer

Fleet Performance

Updated on 12/21/2024

DigitalOcean

DigitalOcean

1,001-5,000 employees

Cloud computing platform for developers

Data & Analytics
Enterprise Software

Compensation Overview

$230k - $270kAnnually

+ Bonus + Equity Compensation

Senior, Expert

Remote in USA

This is a remote role.

Category
Applied Machine Learning
AI Research
AI & Machine Learning
Required Skills
Python
Ruby
Go
Linux/Unix
Requirements
  • Bachelor's or Master's degree in Computer Science, Mathematics, Statistics or Computer/Electrical Engineering or equivalent work experience
  • Extensive knowledge of Linux kernel, hypervisors, and open-source operating systems
  • 7+ experience with performance measurement tools such as profilers, eBPF, XDP, fio, TPCC, MLPerf, and NCCL
  • 5+ years developing strategies for managing, monitoring, and analyzing infrastructure, applications and services
  • Strong proficiency in Go, Python, and/or Ruby
  • Deep understanding of kernel performance aspects, including scheduling, context switching, and hardware acceleration
  • Expertise in distributed systems performance, including tracing and debugging methodologies
  • Knowledge of GPU technology, GPU fabrics, and programming for multi-GPU workloads
  • Demonstrated ability to solve complex problems at scale
  • Strong security mindset with proactive approach to implementing best practices
  • Excellent cross-team collaboration and communication skills
  • Leadership experience in skills development and mentorship
  • Professional-level written and spoken English with strong presentation abilities
Responsibilities
  • Develop and implement comprehensive performance metrics, analysis tools, and reporting systems
  • Lead initiatives to enhance shared infrastructure, balancing performance optimization with rigorous security standards
  • Collaborate with hardware engineering teams and vendors to continuously validate GPU fabric performance
  • Engage with the open-source Linux community to advance virtualization technologies and integrate them into our fleet
  • Conduct in-depth performance analysis of the Linux kernel, virtualization layer, storage, and network stack to devise optimization strategies
  • Identify system bottlenecks proactively and drive optimizations across the hypervisor software stack
  • Work cross-functionally to harness new performance capabilities from evolving hardware architectures
  • Enhance test frameworks, harnesses, and pipelines to ensure robust performance validation
  • Investigate and resolve virtual machine downtime and performance issues in our production environment
  • Participate in on-call rotations as needed to support system reliability

DigitalOcean provides cloud computing services that enable developers and businesses to build, deploy, and scale applications efficiently. Its platform offers a range of fully managed services, allowing users to focus on software development rather than infrastructure management. DigitalOcean stands out from competitors by emphasizing simplicity, a strong community, and open-source support, making it accessible for startups and small to medium-sized businesses. The company's goal is to empower developers and businesses to innovate and grow by providing the tools and resources they need to succeed in the cloud.

Company Stage

IPO

Total Funding

$168.5M

Headquarters

New York City, New York

Founded

2012

Growth & Insights
Headcount

6 month growth

8%

1 year growth

14%

2 year growth

29%
Simplify Jobs

Simplify's Take

What believers are saying

  • DigitalOcean's new Droplet Autoscale Pools enhance workload scaling capabilities.
  • The introduction of Bare Metal GPUs supports demanding AI/ML workloads.
  • Joining the Ceph Foundation strengthens DigitalOcean's open source storage solutions.

What critics are saying

  • Increased competition from Vultr, matching DigitalOcean's $3.5 billion valuation.
  • Operational costs may rise with the NYC2 Data Center expansion.
  • Integration challenges may arise from the partnership with Hugging Face.

What makes DigitalOcean unique

  • DigitalOcean offers a no DevOps required experience for developers.
  • The company focuses on simplicity and open source to attract developers.
  • DigitalOcean's customer service is a key differentiator in the cloud market.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Remote-first

Full health coverage

Wellness coverage

Flexible vacation time

Team-building & social events

401(k) plans

ESPP

Education support

Partner support

Employee giving