Full-Time

AI Research Scientist

ML for Physical Systems

Posted on 9/11/2025

Phaidra

Phaidra

51-200 employees

AI-driven control systems for industrial facilities

Compensation Overview

$160k - $180.2k/yr

+ Equity

Remote in USA + 2 more

More locations: Remote in Canada | Remote in UK

Remote

In the United States, applicants must be located in California, Colorado, Connecticut, Georgia, Florida, Indiana, Maryland, Minnesota, Missouri, Nebraska, New York, North Carolina, Pennsylvania, South Carolina, Tennessee, Texas, Virginia, Washington. In Canada, applicants must be located in Ontario, British Columbia, or Alberta.

Category
AI & Machine Learning (3)
, ,
Required Skills
Scikit-learn
Python
Tensorflow
Keras
Pytorch
Machine Learning
Pandas
NumPy
Requirements
  • * PhD in mechanical engineering, chemical engineering, control systems, applied physics, or a related technical field, with demonstrated expertise in one or more of the following areas: + Thermodynamics + Fluid mechanics + Dynamic system modeling and control
  • * Strong foundation in machine learning and software engineering, with proficiency in Python and open-source ML libraries such as Keras, TensorFlow, PyTorch, scipy, scikit-learn, numpy, pandas, and ray.
  • * 1+ years of applied research experience.
  • * Prior experience with research projects and contributions to open-source software.
  • * Alignment with our company values: curiosity, ownership, transparency & directness, outcome-based performance, and customer empathy.
Responsibilities
  • * Collaborate with other AI researchers on applied real-world problems to demonstrate algorithmic feasibility and enhance algorithmic capabilities.
  • * Design and implement prediction and control algorithms for complex, nonlinear, and dynamic physical systems governed by principles of thermodynamics and fluid dynamics.
  • * Develop and maintain a benchmarking platform for algorithmic performance evaluation and experimental design.
  • * Clearly and efficiently report and present research findings and developments, both internally and externally, verbally and in writing.
  • * Participate in and organize ambitious collaborative research projects.
  • * Work with external collaborators and maintain relationships with relevant research labs and key individuals.
  • * Mentor and guide Research Engineers to apply research findings and developments to industrial domains.
Desired Qualifications
  • * Relevant experience to the position.
  • * A proven track record of publications.
  • * Experience applying AI to real-world scenarios.

Phaidra provides AI-powered virtual plant operators that help operations teams run mission-critical facilities more reliably and efficiently. Its self-learning control systems monitor and optimize cooling, heating, and energy infrastructure by automatically adapting to changing conditions. The company blends advanced AI/ML expertise with practical knowledge of cooling and heating systems, and has demonstrated strong results such as 40% energy savings at Google’s data centers. Its goal is to reduce energy use, boost operational resiliency, and deliver ongoing value for clients in sectors like data centers, pharmaceuticals, and district energy.

Company Size

51-200

Company Stage

Series B

Total Funding

$92.5M

Headquarters

Seattle, Washington

Founded

2019

Simplify Jobs

Simplify's Take

What believers are saying

  • NVIDIA Omniverse DSX Blueprint collaboration standardizes protocols, accelerating gigawatt-scale deployments.
  • $50M Series B from Collaborative Fund and NVIDIA funds power-cooling-workload orchestration expansion.
  • CoreWeave fleet-wide scaling and Salute partnership validate 70% thermal stability gains in AI factories.

What critics are saying

  • CoreWeave replicates Phaidra's RL agents in-house within 9-15 months to eliminate vendor costs.
  • NVIDIA internalizes RL technology post-DSX Blueprint, launching proprietary controls in 12-18 months.
  • Emerald AI captures market with superior simulation agents in Omniverse ecosystems within 6-12 months.

What makes Phaidra unique

  • Phaidra's RL AI agents predict thermal spikes using rack power data, reducing overshoot 75-80% versus PID controls.
  • Phaidra Prism LLM enables root cause analysis and automated responses via natural language for data centers.
  • Leadership from DeepMind and Google pioneered 40% energy savings in Google's data centers since 2019.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Parental Leave

Home Office Stipend

Company Equity

Growth & Insights and Company News

Headcount

6 month growth

-2%

1 year growth

-1%

2 year growth

1%
Angel Business Communications Limited
Mar 31st, 2026
Salute partners with Phaidra for scalable AI operations.

Salute partners with Phaidra for scalable AI operations. Salute teams up with Phaidra to support AI operations in high-density data centres with operational solutions. * Tuesday, 31st March 2026 Posted 22 hours ago in AI Cloud + MS Data Analytics by Sophie Milburn Salute, a data centre services company, has formed a strategic partnership with Phaidra to address challenges associated with AI computing in high-density data environments. The collaboration aims to support faster AI deployment by addressing operational barriers in these settings. Traditional technologies and strategies may not fully address the requirements of high-density computing environments. They can face challenges in managing liquid cooling and continuous high-load computing, which can create obstacles for organisations deploying AI infrastructure at scale. The partnership between Salute and Phaidra introduces an integrated approach designed to support performance, reduce energy use, and improve efficiency. Phaidra's AI-driven data centre controls are combined with Salute's Direct-to-Chip Liquid Cooling Operations, enabling data centres to scale AI operations. Salute's AI as a service is positioned as a comprehensive approach to challenges associated with liquid cooling, particularly as AI operations scale. It has been adopted by a growing number of operators and is used in AI operations. Phaidra's AI agents manage power and cooling systems and adapt in real time. This technology has demonstrated energy savings and improvements in thermal stability. Results have included a 70% improvement in thermal stability on NVIDIA platforms. The integration of Phaidra's AI-driven systems with Salute's operational model is intended to support performance, reliability, and operational outcomes. "Together, Phaidra and Salute are making it faster, simpler and safer for companies to scale their AI operations," said Jim Gao, Phaidra's CEO and Co-Founder. In addition, Salute's acquisition of Northshore and the integration of the Seastack sustainability platform add sustainability insights to its operations. Salute's partner ecosystem supports operations in AI/HPC data centres, combining technology and expertise to support performance. Salute and Phaidra are continuing to develop approaches to support AI operations in high-density computing environments.

GeekWire
Mar 18th, 2026
Backed by Nvidia, Seattle's Phaidra targets data center overheating with proactive AI.

Backed by Nvidia, Seattle's Phaidra targets data center overheating with proactive AI. by Lisa Stiffler on Mar 18, 2026 at 10:41 am Phaidra, a startup using artificial intelligence to make data center operations more energy efficient, this week announced key collaborations with Nvidia, CoreWeave and Applied Digital. The Seattle company revealed "groundbreaking methodology" that predicts and prevents data center heat spikes when computing workloads surge. Phaidra has been partnering with cloud provider CoreWeave and data center operator Applied Digital to test and deploy the cooling strategy. As data center operations and deployments boom nationwide, demand for energy and water to run the facilities and cool the electronics is likewise surging. Operators are eager to find better strategies for building and operating such complex sites. Phaidra is led by alumni from Alphabet's AI research hub DeepMind, launched in 2019. Its technology uses an array of sensors to measure multiple metrics and analyzes that information. The company has raised a total of $120 million and has roughly 90 employees. The startup is #78 on GeekWire 200, our list of the top privately held technology companies in Seattle and the Pacific Northwest. "We envisaged a future where AI agents transform static infrastructure in self-learning, continuously improving infrastructure," Phaidra CEO Jim Gao said on LinkedIn. "That future became reality on the world stage," Gao added, when Nvidia CEO Jensen Huang this week announced the collaboration between Phaidra, the global chip giant, and others. Data centers typically hum at steady operating conditions, but demand can suddenly ramp up when AI training or other large workloads are dispatched. That cranks up the heat produced, which can cause chips to throttle performance to avoid overheating. To prevent this, data center operators often over-cool facilities, wasting power, water and limiting available compute capacity. Phaidra's fix is to use an AI agent that monitors power data as an early-warning signal of an impending operations spike so cooling can kick in proactively - rather than waiting for a temperature rise.

The Associated Press
Mar 4th, 2026
Phaidra launches AI platform to operate gigawatt-scale data centres

Phaidra has launched Phaidra Prism, an AI platform designed to optimise data centre operations for AI workloads, at NVIDIA GTC in San Jose. The platform provides detailed visibility and collaborative intelligence to help operators maximise tokens per watt whilst limiting operational expenses. Phaidra Prism functions as an AI assistant built specifically for data centre operators and technicians. Users interact with equipment through a customised large language model to enhance observability, conduct root cause analysis and issue automated incident responses before issues affect service level agreements. Founded in 2019 by former Google, DeepMind and Trane engineers, Phaidra is an NVIDIA DSX Partner. CEO Jim Gao noted that traditional data centre operation methods no longer suffice as facilities scale to gigawatt-size with increasingly complex infrastructure and higher downtime costs.

Phaidra
Mar 3rd, 2026
Phaidra, CoreWeave and Applied Digital pioneer NVIDIA Max-Q AI factories with agentic liquid cooling management.

Phaidra, CoreWeave and Applied Digital pioneer NVIDIA Max-Q AI factories with agentic liquid cooling management. March 16, 2026 - Phaidra today announced a groundbreaking methodology to drastically improve the thermal stability of liquid-cooled AI data centers. This methodology is outlined in the joint white paper "AI Agents for Liquid-Cooled AI Factories." By successfully leveraging AI-driven, feed-forward control systems on production NVIDIA Grace Blackwell platforms, the collaboration is paving the way for the future of "DSX AI factories" - a new operational paradigm where power, cooling, and workload management are unified to maximize efficiency and computational throughput. Phaidra has integrated NVIDIA DSX Max-Q to run GPU clusters as efficiently as possible, so more of the available power can go towards running AI workloads. The challenge of AI thermal volatility. Modern AI factories are fundamentally different from traditional data centers: defined by massive scale, extreme density, and highly synchronized workloads. Operators of large-scale AI factory campuses, such as Applied Digital, must manage increasingly complex interactions between power infrastructure, liquid cooling systems and rapidly fluctuating GPU workloads for their partners. When massive AI training or inference jobs are dispatched, thousands of networked GPUs ramp up simultaneously, creating "peaky" power profiles that can jump from idle to maximum capacity within seconds. Traditional liquid cooling relies on Proportional-Integral-Derivative (PID) controllers, which wait for a sensor to register a coolant temperature change before taking action. Because coolant has high thermal inertia, this reactive feedback loop suffers from a 3-to-5-minute delay, resulting in rapid heat spikes that force GPUs to throttle performance to protect themselves. To mitigate this, operators significantly over-cool their facilities to create a safety buffer - a strategy that wastes massive amounts of energy and limits overall compute capacity. The AI-driven solution. To close this latency gap, Phaidra developed a self-learning reinforcement learning (RL) AI Agent that fundamentally changes how cooling is managed. Instead of reacting after-the-fact to temperature changes, the AI Agent uses real-time rack power data as a leading indicator to predict and prevent thermal spikes. The agent seamlessly sends optimal setpoint commands to the Coolant Distribution Unit (CDU) before the heat fully registers in the fluid, reducing the effective response delay from minutes to under 10 seconds in validated production environments. Proven results at gigawatt scale. The new methodology underwent rigorous joint A/B testing in live production environments, including an NVIDIA DGX SuperPOD cluster running LLM training workloads and CoreWeave's NVIDIA GB200 NVL72 environments. The results were transformative: * Massive reduction in thermal overshoot: The AI Agent successfully reduced the magnitude of thermal spike overshoots by 75% to 80% compared to optimally-tuned PID baselines during sudden load ramps. * Unprecedented scale: Following this successful validation, Phaidra and CoreWeave are scaling the deployment of these AI agents throughout CoreWeave's liquid-cooled fleet, bringing AI-driven thermal management to its next generation of data center capacity. The pathway to Max-Q AI factories. Phaidra has integrated NVIDIA DSX Max-Q to operate the entire AI factory as a single unit of compute, at scale. By deep integration of Information Technology (IT) and Operational Technology (OT), this collaboration bridges the divide between white space compute and facility operations. With thermal stability secured by Phaidra's AI agents, facilities can safely raise their supply water temperatures, significantly reducing the burden on facility chillers. This provides the foundation for the next phase of the collaboration, where operators dynamically shift stranded power from the cooling system to revenue-generating IT compute. For a baseline 1GW AI factory, raising the coolant temperature safely could unlock billions in additional annual revenue. "In a world where computational resources are limited by energy availability, every watt that isn't being used for valuable token generation is a wasted watt," the joint white paper states. By co-designing power, cooling, and workload management systems, Phaidra, CoreWeave, NVIDIA, and critical infrastructure partner Applied Digital, are setting a new standard for reliability, end-user SLAs, and peak operational efficiency in the age of AI. AI agents for liquid-cooled AI factories Read Phaidra's white paper to learn how to prevent GPU throttling and safely increase facility temperatures to drive more revenue-generating compute. Featured Expert Co-Founder, Chief Executive Officer Jim is a co-founder and the CEO of Phaidra. He sets the strategic direction and leads the company in operational excellence. Prior to Phaidra, Jim led the DeepMind Energy Team and pioneered Google's use of AI controls on their hyperscale data center cooling systems. Prior to DeepMind, he spent a decade working as a Technical Lead for Google's Data Centers. Product | March 03, 2026 Phaidra Prism is an AI assistant designed by data center experts for data center operators and technicians. AI | January 06, 2026 An AI factory operates more like a formula 1 racecar, not a typical data center. Find out how Phaidra's AI agents deliver real-time thermal control for synchronized GPU workloads at gigawatt scale while working with a broader partner ecosystem. Research | December 02, 2025 What kind of data is needed to safely operationalize predictive AI control? Building on a previous blog, class 3 and 4 data provide the context and simulated future states that power real-world autonomy in industrial AI control systems.

Phaidra
Nov 3rd, 2025
Creating AI agents for gigawatt-scale AI factories with NVIDIA Omniverse DSX Blueprint

Creating AI agents for gigawatt-scale AI factories with NVIDIA Omniverse DSX Blueprint. This was a big week for Phaidra and the broader AI factory ecosystem. During his keynote address at GTC DC, NVIDIA founder and CEO Jensen Huang unveiled the NVIDIA Omniverse DSX Blueprint - a comprehensive and open blueprint for designing and operating gigawatt-scale AI factories. Phaidra is proud to build the future of AI factories (i.e. data centers designed specifically for massive AI workloads) alongside NVIDIA and the other DSX ecosystem members. Phaidra, Inc. envisage a future where AI factories operate continuously at peak performance - aided by AI agents that vigilantly manage the complex infrastructure on a 24/7 basis. Jensen announcing the NVIDIA Omniverse DSX Blueprint with Phaidra and others at GTC DC Why now? Two major forcing functions are driving the need for the Omniverse DSX Blueprint: * Inefficiencies get magnified at gigawatt-scale: a single GW AI factory represents both a $50Bn investment and $200Bn revenue opportunity. Every 1% of inefficiency therefore represents $2Bn in lost revenue. * Extreme performance requires extreme co-design: AI factories today are so large, complex, and interconnected that they must operate as a single integrated machine rather than a collection of loosely-orchestrated components. This is the pathway towards step function tokens/watt improvements. By openly sharing a reference architecture for the entire industry to build upon, Phaidra, Inc. make it easier for everyone to collaboratively and iteratively improve energy efficiency, time-to-value, and reliability at scale. Phaidra's unique contributions to the Omniverse DSX Blueprint are: * Helping define the open data exchange standards and communications protocols that enable the various data center control planes (i.e. BMS, PMS, workload manager, etc.) to freely and openly communicate with each other. * Ensuring readiness for the new generation of agentic AI companies like Phaidra, Emerald AI, and others to readily integrate with the physical and digital twin parts of the AI factory - in pursuit of extreme performance. * Developing and sharing OpenUSD SimReady assets, and utilizing existing SimReady assets within the NVIDIA Omniverse digital twin, to rapidly train and evaluate its AI agents. This makes it easier for everyone to leverage robust simulation capabilities to improve AI factory performance. The benefits of a collaborative, simulation-first approach are clear. Within the GTC DC keynote demo, an animation of Phaidra's liquid cooling AI agent was shown in action (reposted below). Phaidra's self-learning AI agent is substantially better at reducing thermal spikes than traditional control systems. This AI agent eliminates the thermal spikes resulting from massively synchronized AI workloads, which in turn enables the AI factory to run at higher TCS temperatures. The end result is substantially less power and capital required for cooling, and more power available for revenue-generating GPUs. Its AI agents were developed and trained in simulation before they were deployed into real-world production systems. Phaidra collaborated closely with NVIDIA's data center engineering team and used an operational digital twin of an NVIDIA DGX SuperPOD to rapidly prototype various AI agent architectures under varying environmental conditions. The optimal AI agent was trained further in simulation (i.e. bootstrapped on synthetic data) before testing on production NVIDIA DGX GB200 systems with live AI workloads. Interestingly, Phaidra, Inc. discovered that the AI agent trained in simulation dramatically improved the performance of the existing liquid cooling control system that had already been fine-tuned by human experts. This is shown in the graph below, which illustrates the before/after performance of its AI agents at providing precision thermal control in liquid cooling systems. The red dots correspond to the traditional control system's performance given a 30-70% load ramp. The blue dots correspond to an AI agent that had been trained in simulation. Finally, the green dots correspond to an AI agent after several hours of live learning on the production system (i.e. the reinforcement learning-based AI agent taught itself to get better without human intervention). Phaidra's AI agent teaching itself to get better at managing AI factory infrastructure without human intervention. This mini-case study illustrates two important points: * Design and iteration in simulation can greatly accelerate your product development. What would have taken multiple quarters of iteration time (because this is mission-critical infrastructure) instead took weeks in simulation. * AI agents trained in simulation can already outperform existing systems - provided the simulation environment is of sufficiently high fidelity. The Phaidra team is proud to work alongside NVIDIA and the team of industry-leaders to define the future of gigawatt-scale AI factories. Its goal is to openly share its learnings and know-how through the Omniverse DSX Blueprint initiative to ensure that AI factories can readily leverage agentic AI technologies to achieve extreme energy efficiency, time-to-value, and reliability.

INACTIVE