Full-Time

Senior DevOps Engineer

Posted on 10/4/2025

FlexAI

FlexAI

11-50 employees

On-demand GPU clusters for AI workloads

No salary listed

Bengaluru, Karnataka, India

In Person

Category
DevOps & Infrastructure (2)
,
Required Skills
Bash
Kubernetes
Rust
Microsoft Azure
Python
Docker
AWS
Go
Terraform
DevOps
Google Cloud Platform
Requirements
  • Bachelor's or higher degree in Computer Science, Software Engineering, or a related field.
  • 8+ years of experience as a DevOps or SRE Engineer, with a strong focus on automation, scalability, and reliability within PaaS environments.
  • Familiarity with cloud-native technologies including container runtimes such as Docker and cluster schedulers such Kubernetes is a must
  • Strong proficiency in scripting languages (e.g., Python, Bash) and familiarity with programming languages such as Go or Rust.
  • Experience with cloud platforms (AWS, Azure, GCP) and infrastructure services, especially in supporting PaaS solutions.
  • Proficiency in containerization and orchestration tools (e.g., Docker, Kubernetes) with experience in managing multi-architecture deployments.
  • Hands-on experience with infrastructure as code (IaC) tools like Terraform, supporting scalable and reliable infrastructure.
  • Strong understanding of CI/CD pipelines and automated testing methodologies.
  • Excellent problem-solving and troubleshooting skills, especially in the context of Beta testing and production environments.
  • Excellent collaboration and communication skills to work effectively with cross-functional teams.
  • Entrepreneurial & start-up mindset!
Responsibilities
  • Design, implement, and maintain CI/CD pipelines to support the efficient delivery and deployment of our Beta product, ensuring seamless customer experience.
  • Develop and manage infrastructure as code (IaC) using tools like Terraform, enabling scalable and repeatable infrastructure that supports our PaaS goals.
  • Implement and manage containerization and orchestration tools (e.g., Docker, Kubernetes) to ensure scalable deployment across various architectures.
  • Monitor and optimize system performance, proactively identifying and resolving bottlenecks to maintain reliability and efficiency during Beta testing and beyond.
  • Collaborate with software developers and backend engineers to ensure the seamless integration and performance of backend services within our PaaS infrastructure.
  • Ensure system reliability and availability by implementing best practices in monitoring, alerting, and incident response, particularly as we scale our Beta product.
  • Troubleshoot and resolve infrastructure issues promptly to minimize downtime and maintain customer trust.
  • Collaborate with security teams to ensure infrastructure meets security best practices and compliance requirements, especially in a multi-architecture environment.
  • Automate routine tasks to improve efficiency and reduce manual intervention, focusing on maintaining the flexibility and reliability of our PaaS offerings.
Desired Qualifications
  • Familiarity with AI model training is a significant advantage.

FlexAI provides a Workload as a Service (WaaS) platform that gives AI developers on-demand access to scalable GPU compute for the full AI lifecycle, including training, fine-tuning, and inference. The platform lets users spin up GPU clusters quickly and run workloads in parallel, with automated scaling to optimize resource use and costs. It also supports performing multiple fine-tuning runs across different datasets and models at the same time, enabling rapid experimentation while the platform handles infrastructure management. This differentiates FlexAI from competitors by offering serverless, pay-per-use AI-optimized compute that abstracts away hardware setup and maintenance, focusing users on building and deploying models. The company aims to accelerate AI initiatives for developers, data scientists, and organizations by delivering cost-efficient, scalable compute without the overhead of managing their own GPUs.

Company Size

11-50

Company Stage

Seed

Total Funding

$30M

Headquarters

Paris, France

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • FlexAI raised $30M seed in April 2024 from Alpha Intelligence Capital and Elaia Partners.
  • Tenstorrent partnership integrates WaaS with chips for affordable AI infrastructure.
  • Sesterce collaboration provides sovereign AI computing for European startups.

What critics are saying

  • CoreWeave undercuts pricing with NVIDIA exclusive GPUs, capturing enterprises in 6-12 months.
  • Lambda Labs spot instances erode FlexAI's cost savings for developers in 3-9 months.
  • NVIDIA DGX Cloud locks H100 capacity, forcing FlexAI price hikes in 6-12 months.

What makes FlexAI unique

  • FlexAI delivers Workload as a Service platform abstracting AI infrastructure complexities.
  • FlexBench open-source benchmark modularizes MLPerf for LLM inference on Hugging Face.
  • FlexAI enables simultaneous fine-tuning of multiple datasets and models for rapid experimentation.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

401(k) Company Match

Hybrid Work Options

Flexible Work Hours

Professional Development Budget

Wellness Program

Growth & Insights and Company News

Headcount

6 month growth

8%

1 year growth

4%

2 year growth

28%
FlexAI
Mar 31st, 2026
AI infrastructure that adapts as you grow. Deploy once. We handle the rest.

AI infrastructure that adapts as you grow. Deploy once. FlexAI handle the rest. FlexBench: an open-source, modular MLPerf benchmark for LLM inference. March 31, 2026 Tl;dr. AI system benchmarks like MLPerf struggle to keep pace with the rapidly evolving model landscape, making it difficult for organizations to make informed deployment decisions. FlexAI believe benchmarking should itself be treated as a machine learning problem - one where models are continuously tested and optimized across datasets, software, and hardware based on metrics like accuracy, latency, throughput, power consumption, and cost. That's why FlexAI built FlexBench: a modular, open-source version of the MLPerf LLM inference benchmark connected to Hugging Face. FlexBench aggregates existing and new benchmarking results into an Open MLPerf dataset, which can be collaboratively cleaned, extended, and used for predictive modeling. FlexAI validated FlexBench through its MLPerf Inference 5.0 submission, benchmarking DeepSeek R1 and LLaMA 3.3 on commodity H100 servers. Its long-term goal: empower teams to make cost-effective AI deployment decisions based on their available resources, requirements, and constraints. Why MLPerf falls short for LLM inference benchmarking. AI service providers, server developers, and data center operators face a critical challenge: selecting the right hardware and software stack to ensure ROI within 3-5 years in a rapidly shifting landscape [1]. MLPerf was introduced as a full-stack inference benchmark to evaluate accuracy, latency, and throughput in a standardized, reproducible manner across diverse hardware and software stacks [2]. But traditional benchmarks face a fundamental limitation: the combinatorial explosion of models, datasets, methods, and hardware configurations. Hugging Face alone hosts over a million ML models, more than 10,000 datasets, and thousands of methods - while new (and often incompatible) hardware and software ship continuously. Exploring all possible configurations is not only impractical, it's prohibitively expensive. MLPerf currently covers only a limited set of combinations - typically around a dozen - and updates just once a year. Its LLM benchmarks still focus on models like BERT, GPT-J, LLaMA 2 70B, LLaMA 3 405B, and Mixtral 8x7B, even as newer models like DeepSeek dominate production workloads. Worse, its hands-on experience with MLPerf shows that heavily over-optimized results from a few chip manufacturers are rarely achievable out of the box on other models, software versions, or hardware - significantly limiting their practical usefulness. Reframing GPU benchmarking as a machine learning problem. FlexAI believe a fundamentally different approach is needed. Drawing on its past experience using AI to improve computer systems, FlexAI propose redefining MLPerf benchmarking as a learning task - with an open dataset of results and trainable objective functions to optimize key metrics such as accuracy, latency, throughput, power consumption, and cost [3][4][5]. To support this vision, FlexAI developed FlexBench - an open-source, modular, and flexible version of the MLPerf language inference benchmark connected to the Hugging Face Hub. With a unified codebase and CLI, users can benchmark a wide range of models and datasets by adjusting just a few input parameters. FlexBench is designed for continuous evolution. FlexAI use the MLCommons CMX workflow automation framework to aggregate both existing and new benchmarking results - along with their associated metadata - into an open MLPerf dataset published on GitHub and Hugging Face. This dataset can be collaboratively cleaned, extended, and analyzed using standard data analytics techniques, including predictive modeling and feature engineering. FlexAI then use FlexBoard to visualize, compare, and predict the most suitable software/hardware configurations for different models based on user requirements and constraints. FlexBench architecture: client-server LLM inference benchmarking. FlexBench uses a client-server architecture where the FlexBench client connects to a running vLLM server. It's built on MLPerf LoadGen, the official and reusable MLPerf harness that efficiently and fairly measures inference system performance [2][9]. Its goal is to retain MLPerf's rigorous measurement standards while making the framework more flexible by abstracting models and datasets as interchangeable modules. Hugging Face or local LLMs and datasets can be used with minimal setup. FlexBench supports two standard MLPerf inference modes: * Server (streaming) mode: Queries arrive according to a Poisson distribution, mimicking real-world request patterns * Offline mode: All queries are sent simultaneously to maximize throughput FlexBench returns detailed metrics from LoadGen - including TTFT, throughput, and latency percentiles - all compliant with MLPerf standards and suitable for inclusion in the Open MLPerf dataset for further analysis and predictive analytics. FlexAI cross-validated these results against the vLLM benchmarking infrastructure and found strong alignment in performance numbers. FlexBench also provides accuracy metrics to guide further model optimizations such as quantization, pruning, and distillation. FlexAI has introduced a Query Per Second (QPS) sweep mode to help users automatically identify the optimal QPS for their specific model, software, and hardware combination. FlexBoard is implemented as a Gradio module that loads the Open MLPerf dataset via MLCommons CK/CM/CMX automations. It includes various predictive modeling and visualization plugins to help users analyze this data and predict the most efficient and cost-effective software/hardware configurations based on their requirements and constraints. Benchmarking DeepSeek R1 and LLaMA 3.3 on H100 GPUs. FlexAI validated FlexBench through its MLPerf Inference 5.0 submission by benchmarking several non-MLPerf LLM models - including DeepSeek R1 and LLaMA 3.3 - on the OpenOrca dataset, using commodity servers equipped with NVIDIA H100 GPUs. Its automation framework enabled rapid switching between models, datasets, and hardware configurations by simply modifying command-line parameters, without requiring any code changes. FlexAI also invested significant effort assembling the Open MLPerf dataset by unifying past MLPerf Inference results (v4.0) and combining them with the latest official submissions and FlexBench data. To enable predictive modeling, FlexAI cleaned the dataset, standardized disparate fields, and engineered new features such as model size and data type. FlexAI has released this cleaned and curated dataset, along with FlexBoard and predictive analytics tools, to help the broader ML, AI, and systems community accelerate benchmarking, evaluation, and optimization efforts. For example, its proof-of-concept prototype allows users to input system costs and predict optimal software/hardware configurations based on model size and data type features. What's next for FlexBench and Open MLPerf. FlexBench and FlexBoard are still in early-stage prototyping. FlexAI invite researchers and practitioners to explore the tools, provide feedback, and collaborate on the following: * Extending FlexBench to support all types of models, datasets, and systems * Expanding the Open MLPerf dataset with FlexBench results from various models across diverse software/hardware configurations from different vendors * Engineering improved features - such as model graphs, tensor shapes, compiler optimizations, accelerator capabilities, and hardware topology - to enhance predictions for previously unseen AI workloads * Extending and improving FlexBoard based on user requirements and feedback * Sharing your specific benchmarking challenges to help guide its priorities Its long-term goal is to enable anyone to run AI models efficiently and cost-effectively, tailored to their available resources, requirements, and constraints. If you're interested in its approach or would like to collaborate, reach out to the authors at FCS Labs. References. Get started today. Start building with €100 in free credits for first-time users.

Startup Rise EU
Sep 1st, 2025
FlexAI Secures $30M for AI Cloud

French startup FlexAI has emerged from stealth with $30M in seed funding to launch an AI training streaming cloud service. The funding round was led by Alpha Intelligence Capital, Elaia Partners, and Heartcore Capital, with participation from Frst Capital, Motier Ventures, Partech, and InstaDeep CEO Karim Beguir. FlexAI aims to simplify AI infrastructure access, enabling developers to build and train AI applications more easily. The company plans to release its first business offering later this year.

Flex AI
Jul 8th, 2025
FlexAI Powered by Sesterce: The Partnership to Drive AI Innovation Under European Umbrella

At a time when digital sovereignty is more critical than ever, FlexAI has launched an integrated platform that goes beyond the promise of "EU cloud regions".

Business Korea
Jun 11th, 2025
Flex Secures 10B Won Series B-1 Funding

Flex, a Korean HR platform, secured a ₩10 billion Series B-1 funding from Han River Partners, valuing the company at ₩500 billion. Established in 2019, Flex plans to use the funds to launch AI-driven services, enhance its AI engineering, sales, and marketing teams, and recruit leadership talent. The company aims for Series C funding in 1-2 years, potentially doubling its valuation to become a unicorn. Flex's ARR has surpassed ₩30 billion, and it seeks to pioneer AI SaaS in Korea.

PR.com
Jun 10th, 2025
FlexAI and Tenstorrent Partner to Democratize AI Infrastructure

The partnership combines FlexAI's Workload as a Service platform with Tenstorrent's powerful chips to give companies fast, affordable AI infrastructure while significantly reducing the effort to manage the backend.

INACTIVE