Full-Time

Senior Site Reliability Engineer

ML System

Updated on 3/14/2025

ByteDance

ByteDance

10,001+ employees

Operates global content platforms and apps

No salary listed

Senior

Company Does Not Provide H1B Sponsorship

San Jose, CA, USA

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
Kubernetes
Python
Machine Learning
Go
Linux/Unix
Requirements
  • Bachelor's degree or above, major in computer science, computer engineering or related
  • Strong proficiency in at least one programming language such as Go/Python/Shell in Linux environment
  • Strong hands-on experience with Kubernetes and containers skills, and have more than 2 years of relevant operation and maintenance experience
  • Possess excellent logical analysis ability, able to reasonably abstract and split business logic
  • Have good documentation principles and habits to be able to write and update workflow and technical documentation as required on time
  • Possess a strong sense of responsibility, good learning ability, communication ability and self-drive, good team spirit
Responsibilities
  • Responsible for ensuring our ML systems are operating and running efficiently for large model development, training, evaluation, and inference
  • Responsible for the stability of offline tasks/services in multi-data center, multi-region, and multi-cloud scenarios
  • Responsible for resource management and planning, cost and budget, including computing and storage resources
  • Responsible for global system disaster recovery, cluster machine governance, stability of business services, resource utilisation improvement and operation efficiency improvement
  • Build software tools, products and systems to monitor and manage the ML infrastructure and services efficiently
  • Be part of the global team roster that ensures system and business on-call support
Desired Qualifications
  • Engaged in the operation and maintenance of large-scale ML distributed systems
  • Experience in operation and maintenance of GPU servers

ByteDance operates various content platforms, including Toutiao for news aggregation and TikTok for short video sharing, catering to a global audience. The company uses advanced algorithms to personalize user experiences, which keeps users engaged and returning for more. ByteDance differentiates itself from competitors like Facebook and Google by focusing on user-generated content and effective targeting for advertising. Its goal is to connect users with relevant content while providing businesses with effective advertising solutions.

Company Size

10,001+

Company Stage

Private

Total Funding

$5.6B

Headquarters

Beijing, China

Founded

2012

Simplify Jobs

Simplify's Take

What believers are saying

  • Increased AI focus boosts user retention and advertising revenue on TikTok and Douyin.
  • ByteDance's $400 billion valuation indicates strong investor confidence for future expansion.
  • OmniHuman-1 opens new revenue streams in digital content creation and entertainment.

What critics are saying

  • Ole Obermann's departure may disrupt TikTok's music licensing and partnerships.
  • Cancellation of Broadcom chip project could affect ByteDance's AI hardware innovation.
  • Geopolitical tensions over TikTok may lead to regulatory challenges or ownership changes.

What makes ByteDance unique

  • ByteDance's AI-driven content personalization enhances user engagement on platforms like TikTok and Douyin.
  • AIBrix positions ByteDance as a leader in AI research, attracting partnerships and collaborations.
  • PhotoDoodle AI diversifies ByteDance's offerings, appealing to digital art enthusiasts.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Hybrid Work Options

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

0%

2 year growth

0%
Asian Financial
Mar 12th, 2025
Shengshu Technology Appoints Former ByteDance AI Executive as CEO

AsianFin - Shengshu Technology, an AI video company, has appointed Luo Yihang, a former AI executive at ByteDance and head of the AI unit at Volcano Engine, as its new CEO.

Investing.com
Mar 4th, 2025
ByteDance launches new share repurchase program at higher valuation - Reuters

ByteDance launches new share repurchase program at higher valuation - Reuters.

TweakTown
Mar 2nd, 2025
ByteDance's custom chip made by Broadcom has been canceled, Broadcom to lose $2B to $3B

In a different post on X, @Jukanlosreve explained: "In June last year, ByteDance reportedly partnered with Broadcom to develop a 5nm AI accelerator, a type of ASIC.

U.S. News & World Report
Feb 28th, 2025
Bytedance's TikTok to Invest $8.8 Billion in Thailand Data Centres, Official Says

Bytedance's TikTok to invest $8.8 billion in Thailand data centres, official says.

Aibase
Feb 28th, 2025
ByteDance Launches AIBrix: A New Open-Source Inference System Designed for Large Language Models

ByteDance launches AIBrix: A new open-source inference system designed for large language models.