Full-Time

Lead Cluster Operations Support Engineer

Posted on 2/21/2025

Thoughtworks

Thoughtworks

10,001+ employees

Technology consultancy for digital transformation

Data & Analytics
Consulting
Enterprise Software

Senior, Expert

Remote in UK

Category
Customer Success
Customer Support
Customer Success & Support
Required Skills
Kubernetes
Microsoft Azure
AWS
Terraform
Linux/Unix
Helm
Google Cloud Platform

You match the following Thoughtworks's candidate preferences

Employers are more likely to interview you if you match these preferences:

Degree
Experience
Requirements
  • Deep expertise Kubernetes administration and debugging at scale.
  • Deep knowledge of managing large clusters with 1000s of nodes with K8s.
  • Knowledge of running training workloads on 1000s of GPUs.
  • Knowledge of working with the Lustre filesystem is a plus.
  • Knowledge of working with NVIDIA NeMo Framework (Docker image for model training).
  • Knowledge of working with NVIDIA NeMo NIMs (Docker images for inference).
  • Underlying Cloud: GCP, AWS, Azure.
  • Terraform / Pulumi, Helm Charts, Linux, other Infrastructure-as-code tools.
Responsibilities
  • You will help shape and iterate this new white glove model training support service on large GPU clusters.
  • You will work in a collaborative team with Machine Learning Engineers and Infrastructure Engineers.
  • You will contribute to accelerator development: find gaps in the tooling, or needed automation, or patterns we would develop accelerators to make the next round of this more efficient and faster.
  • You will help assess the model training readiness and data preparation.
  • You will provide model training support rotating daytime weekend shifts - with pagers, to any issues they may encounter.
  • You will facilitate collaborative problem solving within the team by actively listening, communicating effectively and mentoring other engineers.
  • You will proactively identify and address challenges related to the white glove service for continued pre training, proposing solutions and implementing improvements.
Desired Qualifications
  • Nice to have: Run:ai, TrueFoundry, Huggingface platform etc (can provide training).
  • Knowledge of working with HPC technologies such as Slurm is a bonus.

Thoughtworks helps businesses modernize and innovate by providing consultancy services that combine strategy, design, and software engineering. Their approach involves working closely with clients to understand their specific challenges and goals, allowing them to create tailored solutions that often include custom software development and system modernization. Thoughtworks stands out from competitors by focusing on a diverse range of industries and leveraging data and artificial intelligence to unlock new value for clients. The company's goal is to enable organizations to thrive in the digital age by delivering impactful solutions that drive innovation and growth.

Company Size

10,001+

Company Stage

IPO

Total Funding

$727.6M

Headquarters

Chicago, Illinois

Founded

1993

Simplify Jobs

Simplify's Take

What believers are saying

  • Recent privatization allows focus on long-term strategy and AI-enabled services.
  • Collaboration with AI Singapore enhances GenAI reliability and innovation.
  • Growing demand for custom software boosts Thoughtworks' market opportunities.

What critics are saying

  • Leadership changes may disrupt company culture and client relationships.
  • Privatization could lead to shifts in priorities affecting employee morale.
  • AI ethics and reliability issues could impact Thoughtworks' reputation.

What makes Thoughtworks unique

  • Thoughtworks combines strategy, design, and engineering for comprehensive digital transformation solutions.
  • Recognized as a Visionary in custom software development by Gartner in 2024.
  • Global presence with diverse industry expertise, including finance, healthcare, and retail.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Hybrid Work Options

Professional Development Budget

Flexible Work Hours

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

0%

2 year growth

-3%
IT Brief Asia
Feb 19th, 2025
Thoughtworks appoints new leaders to drive growth strategy

In addition to these leadership positions, Thoughtworks has appointed Ami Kaplan and Michael Carajohn to the Board of Directors of its parent entity, Tasmania Parent.

HR Tech Wire
Feb 11th, 2025
Thoughtworks Welcomes New Leadership to Continue Strategic Growth and Client Success

With over 25 years of experience in digital transformation across diverse industries, Steven joins Thoughtworks from Merkle, where he served as CEO for Australia and New Zealand, and previously held senior leadership roles at Accenture across Asia and Europe.

Stock Titan
Nov 13th, 2024
Thoughtworks Goes Private in $1.75B Deal

Thoughtworks has gone private following a $1.75 billion acquisition by Apax Funds. Shareholders will receive $4.40 per share in cash, a 48% premium over the 30-day volume-weighted average price before the announcement. As a result, Thoughtworks' shares will no longer trade on NASDAQ. The company plans to focus on long-term strategy, enhancing its digital solutions, and expanding its leadership in AI-enabled software and data engineering services.

Orissa Diary
Oct 30th, 2024
Thoughtworks Stake Acquired by AP Funds

The Competition Commission of India has approved the acquisition of additional shares in Thoughtworks Holding, Inc. by AP Funds and Temasek. AP Funds, advised by Apax Partners LLP, will wholly own Thoughtworks, while Temasek, through Nevado Investments, will hold about 10% as a minority non-controlling investor.

CE Pro
Oct 23rd, 2024
Home Security Manufacturer Works With Consultanty to Develop Home Security AI Assistant

Thoughtworks, a global tech consultancy, helped produce the AI assistant, called SwannShield, by leveraging data and AI.