Full-Time

Lead NOC & Incident Management

Posted on 2/16/2026

FluidStack

FluidStack

201-500 employees

High-performance GPU cloud for AI workloads

Compensation Overview

$200k - $300k/yr

+ Equity

Austin, TX, USA

In Person

Category
Engineering Management (1)
Required Skills
Grafana
Prometheus
Requirements
  • 5+ years in network operations, infrastructure operations, or site reliability roles with significant experience running and building a NOC, operations center, or equivalent 24/7 monitoring function; built shift models, managed MSP relationships, and know how to turn a collection of monitors into a high-performing operational team, ideally at global scale
  • Deep experience with structured incident response processes — severity classification, escalation matrices, incident bridges, post-incident reviews, and RCA workflows; has been an Incident Manager or Incident Commander for major incidents and understands that incident management requires training, practice, and continuous refinement
  • Enough technical breadth to triage alerts intelligently and to earn the trust of engineering teams; experience with data center infrastructure (network, power, cooling) and modern monitoring stacks (Prometheus/VictoriaMetrics, Grafana, AlertManager) is strongly preferred
  • Experience building operational processes from scratch; ability to design runbooks that operators can execute reliably, escalation criteria that are crisp and actionable, and training programs that get new team members productive quickly; iterates based on real-world feedback
  • Exceptional at building partnerships across functional teams without direct authority; leads through credibility, follow-through, and consistent operational excellence rather than organizational hierarchy
  • Understanding that operational metrics matter for customer trust and has worked in environments with stringent SLAs and can design processes with 2 AM occurrences in mind
Responsibilities
  • Stand up the cross-functional operations center from scratch and assist in selecting and onboarding a managed service provider partner for Tier 1 coverage; build staffing models, handoff processes, KPIs, and quality standards; ensure someone is qualified watching every alert 24/7
  • Create, deploy and operationalize Fluidstack’s incident management framework; manage the Incident Manager on-call rotation; train engineers on incident roles; run incident bridges during SEV0/SEV1 events; ensure post-incident reviews happen on schedule and action items close; partner with the Program Manager to continuously improve the framework based on real-world execution
  • Own the operational readiness for every new domain onboarded to the NOC; drive runbook quality assurance with functional teams; plan and execute tabletop exercises; coordinate with the Platform team on incident.io tooling workflows; onboard new infrastructure domains into NOC coverage on a phased schedule aligned with datacenter launches
  • Build tight operational partnerships with Network Operations, Data Center Operations, Systems/Platform, and Security teams; define clear Tier 1 to Tier 2 escalation criteria for each domain; ensure the NOC acts as a force multiplier for engineering teams by absorbing monitoring, triage, vendor ticket management, and incident coordination
  • Establish processes for the NOC to manage the full lifecycle of carrier and vendor tickets — creation, tracking, SLA enforcement, escalation; work with Network Operations and Data Center Operations to define ticket templates, escalation triggers, and vendor communication standards; ensure no ticket falls through the cracks and every carrier/vendor interaction is documented
  • Establish operational metrics (MTTA, MTTR, escalation rate, false positive rate, runbook coverage) and reporting cadence; use data to identify patterns, reduce alert noise, improve runbook quality, and drive down incident response times; produce monthly operational reports for leadership and customer-facing stakeholders
Desired Qualifications
  • Hyperscale or Large-Scale Infrastructure Background: experience operating NOC/operations centers at hyperscale companies (Meta, Google, Microsoft, AWS), large telecommunication companies, or major AI infrastructure providers
  • Incident Management Tooling: hands-on experience with incident management platforms incident.io, PagerDuty, Opsgenie, ServiceNow including configuration of escalation policies, on-call schedules, and alert routing; bonus if led a platform migration or stood up a new instance from scratch
  • MSP/Vendor Management: experience selecting onboarding, and managing managed service providers for NOC or operations functions; written SOWs, negotiated SLAs, and managed the transition from outsourced to internal operations
  • Facilities & Building Management System Familiarity: exposure to datacenter facilities operations — power distribution, cooling systems, CDUs, BMS/SCADA alerting
  • Carrier & ISP Operations: experience managing carrier relationships, circuit troubleshooting, and vendor ticket workflows; familiarity with carrier NOC processes, circuit ID management, and SLA enforcement
  • Startup Experience: built something from scratch — ideally in a high-growth infrastructure or cloud company; comfortable with rapid context switching and evolving requirements

FluidStack provides GPU-based cloud infrastructure for artificial intelligence workloads, delivering large-scale Nvidia GPU clusters through a neocloud model. The platform offers automated provisioning and a centralized orchestration layer that hides hardware complexity, with native support for Kubernetes and Slurm and proprietary monitoring to track power usage and hardware health. It targets AI labs, research institutions, and enterprise tech teams that need scalable, pay-as-you-go access to high-performance compute without owning data centers. The company's goal is to make it easy for organizations to train, develop, and deploy complex machine learning models by providing reliable, scalable GPU resources on demand.

Company Size

201-500

Company Stage

Late Stage VC

Total Funding

$11B

Headquarters

New York City, New York

Founded

2017

Simplify Jobs

Simplify's Take

What believers are saying

  • Anthropic's $50 billion deal builds custom data centers in New York and Texas.
  • Coatue's Next Frontier JV funds 430MW Indiana campus online by December 2026.
  • $750 million raise at $7 billion valuation accelerates US expansion creating 1,000 jobs.

What critics are saying

  • CoreWeave undercuts Fluidstack's pricing, capturing ex-OpenAI researchers within 6-12 months.
  • $1 billion round at $18 billion valuation fails by July 2026, causing liquidity crunch.
  • Google terminates Indiana lease if Fluidstack defaults on $5.7 billion bonds by 2028.

What makes FluidStack unique

  • Fluidstack delivers zero-setup multi-thousand GPU clusters for AI researchers from OpenAI and DeepMind.
  • Lighthouse platform enables proactive monitoring and automated remediation without customer intervention.
  • HIPAA, GDPR, ISO27001, and SOC 2 TYPE 2 compliance secures regulated AI labs and enterprises.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at FluidStack who can refer or advise you

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

Company Equity

Unlimited Paid Time Off

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

-9%

2 year growth

-9%
Bloomberg L.P.
Apr 14th, 2026
Fluidstack Seeks $1 Billion in New Funding at $18 Billion Valuation

The cloud-computing startup Fluidstack Ltd. is holding funding talks with investors to bring in about $1 billion at a target valuation of $18 billion, according to people briefed on the matter.

Yahoo Finance
Apr 6th, 2026
UK data centre startup Fluidstack raises $750M, hits $7B valuation for US AI expansion

Fluidstack, a London-founded data centre startup, has been valued at $7 billion after raising over $750 million in funding. The company, established in 2017 by Gary Wu, Cesar Maklary and James Cox, is building AI infrastructure across America. The startup relocated its headquarters from London to New York in December to focus on US customers, creating over 1,000 jobs. New investors include Situational Awareness, an AI hedge fund founded by former OpenAI employee Leopold Aschenbrenner. Fluidstack is backed by Google, which has provided a $1.8 billion backstop to the company's data centre lease obligations and is reportedly in talks for an equity stake. The company is also working with Anthropic to build up to $50 billion of AI data facilities across New York and Texas.

Telegraph Media Group
Apr 5th, 2026
UK data centre giant raises $750m for US expansion

City sources say Fluidstack could still secure additional funding as start-up hits $7bn valuation

Yahoo Finance
Mar 20th, 2026
Fluidstack scraps $11.5B French data center for US expansion backed by $50B Anthropic deal

Fluidstack has abandoned an $11.5 billion data centre project in northern France to focus on US expansion, according to Bloomberg. The operator is relocating its global headquarters from the UK to New York and exited a secondary facility near Paris used by Mistral. The move could prove beneficial for Bitcoin miners partnering with Fluidstack. Hut 8, TeraWulf and Cipher Mining have signed deals with the firm over the past six months. Hut 8's 15-year agreement to build a 245-megawatt Louisiana site with Fluidstack and Anthropic generates $7 billion in revenue, potentially rising to $17.7 billion with expansion clauses. Fluidstack's US expansion includes a $50 billion master agreement with Anthropic to operate compute clusters across New York, Texas and other states.

Yahoo Finance
Mar 15th, 2026
Google-backed Fluidstack signs $7B, 15-year AI lease with Hut 8 as miner pivots to data centres

Hut 8 Corp has signed a 15-year, $7 billion IT capacity lease with Google-backed Fluidstack at its River Bend campus, marking a strategic shift from pure Bitcoin mining towards AI and data centre infrastructure. The company also sold a 310MW natural gas power plant portfolio to refocus capital. The deal is part of Hut 8's broader push to build 245MW to 2,295MW of AI data centre capacity with blue-chip clients. The company is carving out legacy mining operations into American Bitcoin whilst developing an 8,500MW infrastructure pipeline. Hut 8's narrative projects $767.3 million revenue and $140.6 million earnings by 2028, requiring 76.9% yearly revenue growth. Some analysts expect the company to reach $1.1 billion in revenue by 2028, though execution risks and potential dilution from capital-intensive expansion remain key concerns.

INACTIVE