Full-Time
Posted on 10/31/2025
On-demand GPU clusters for AI workloads
No salary listed
Bengaluru, Karnataka, India
In Person
FlexAI provides a Workload as a Service (WaaS) platform that gives AI developers on-demand access to scalable GPU compute for the full AI lifecycle, including training, fine-tuning, and inference. The platform lets users spin up GPU clusters quickly and run workloads in parallel, with automated scaling to optimize resource use and costs. It also supports performing multiple fine-tuning runs across different datasets and models at the same time, enabling rapid experimentation while the platform handles infrastructure management. This differentiates FlexAI from competitors by offering serverless, pay-per-use AI-optimized compute that abstracts away hardware setup and maintenance, focusing users on building and deploying models. The company aims to accelerate AI initiatives for developers, data scientists, and organizations by delivering cost-efficient, scalable compute without the overhead of managing their own GPUs.
Company Size
11-50
Company Stage
Seed
Total Funding
$30M
Headquarters
Paris, France
Founded
2023
Help us improve and share your feedback! Did you find this helpful?
Health Insurance
Dental Insurance
Vision Insurance
401(k) Retirement Plan
401(k) Company Match
Hybrid Work Options
Flexible Work Hours
Professional Development Budget
Wellness Program
AI infrastructure that adapts as you grow. Deploy once. FlexAI handle the rest. FlexBench: an open-source, modular MLPerf benchmark for LLM inference. March 31, 2026 Tl;dr. AI system benchmarks like MLPerf struggle to keep pace with the rapidly evolving model landscape, making it difficult for organizations to make informed deployment decisions. FlexAI believe benchmarking should itself be treated as a machine learning problem - one where models are continuously tested and optimized across datasets, software, and hardware based on metrics like accuracy, latency, throughput, power consumption, and cost. That's why FlexAI built FlexBench: a modular, open-source version of the MLPerf LLM inference benchmark connected to Hugging Face. FlexBench aggregates existing and new benchmarking results into an Open MLPerf dataset, which can be collaboratively cleaned, extended, and used for predictive modeling. FlexAI validated FlexBench through its MLPerf Inference 5.0 submission, benchmarking DeepSeek R1 and LLaMA 3.3 on commodity H100 servers. Its long-term goal: empower teams to make cost-effective AI deployment decisions based on their available resources, requirements, and constraints. Why MLPerf falls short for LLM inference benchmarking. AI service providers, server developers, and data center operators face a critical challenge: selecting the right hardware and software stack to ensure ROI within 3-5 years in a rapidly shifting landscape [1]. MLPerf was introduced as a full-stack inference benchmark to evaluate accuracy, latency, and throughput in a standardized, reproducible manner across diverse hardware and software stacks [2]. But traditional benchmarks face a fundamental limitation: the combinatorial explosion of models, datasets, methods, and hardware configurations. Hugging Face alone hosts over a million ML models, more than 10,000 datasets, and thousands of methods - while new (and often incompatible) hardware and software ship continuously. Exploring all possible configurations is not only impractical, it's prohibitively expensive. MLPerf currently covers only a limited set of combinations - typically around a dozen - and updates just once a year. Its LLM benchmarks still focus on models like BERT, GPT-J, LLaMA 2 70B, LLaMA 3 405B, and Mixtral 8x7B, even as newer models like DeepSeek dominate production workloads. Worse, its hands-on experience with MLPerf shows that heavily over-optimized results from a few chip manufacturers are rarely achievable out of the box on other models, software versions, or hardware - significantly limiting their practical usefulness. Reframing GPU benchmarking as a machine learning problem. FlexAI believe a fundamentally different approach is needed. Drawing on its past experience using AI to improve computer systems, FlexAI propose redefining MLPerf benchmarking as a learning task - with an open dataset of results and trainable objective functions to optimize key metrics such as accuracy, latency, throughput, power consumption, and cost [3][4][5]. To support this vision, FlexAI developed FlexBench - an open-source, modular, and flexible version of the MLPerf language inference benchmark connected to the Hugging Face Hub. With a unified codebase and CLI, users can benchmark a wide range of models and datasets by adjusting just a few input parameters. FlexBench is designed for continuous evolution. FlexAI use the MLCommons CMX workflow automation framework to aggregate both existing and new benchmarking results - along with their associated metadata - into an open MLPerf dataset published on GitHub and Hugging Face. This dataset can be collaboratively cleaned, extended, and analyzed using standard data analytics techniques, including predictive modeling and feature engineering. FlexAI then use FlexBoard to visualize, compare, and predict the most suitable software/hardware configurations for different models based on user requirements and constraints. FlexBench architecture: client-server LLM inference benchmarking. FlexBench uses a client-server architecture where the FlexBench client connects to a running vLLM server. It's built on MLPerf LoadGen, the official and reusable MLPerf harness that efficiently and fairly measures inference system performance [2][9]. Its goal is to retain MLPerf's rigorous measurement standards while making the framework more flexible by abstracting models and datasets as interchangeable modules. Hugging Face or local LLMs and datasets can be used with minimal setup. FlexBench supports two standard MLPerf inference modes: * Server (streaming) mode: Queries arrive according to a Poisson distribution, mimicking real-world request patterns * Offline mode: All queries are sent simultaneously to maximize throughput FlexBench returns detailed metrics from LoadGen - including TTFT, throughput, and latency percentiles - all compliant with MLPerf standards and suitable for inclusion in the Open MLPerf dataset for further analysis and predictive analytics. FlexAI cross-validated these results against the vLLM benchmarking infrastructure and found strong alignment in performance numbers. FlexBench also provides accuracy metrics to guide further model optimizations such as quantization, pruning, and distillation. FlexAI has introduced a Query Per Second (QPS) sweep mode to help users automatically identify the optimal QPS for their specific model, software, and hardware combination. FlexBoard is implemented as a Gradio module that loads the Open MLPerf dataset via MLCommons CK/CM/CMX automations. It includes various predictive modeling and visualization plugins to help users analyze this data and predict the most efficient and cost-effective software/hardware configurations based on their requirements and constraints. Benchmarking DeepSeek R1 and LLaMA 3.3 on H100 GPUs. FlexAI validated FlexBench through its MLPerf Inference 5.0 submission by benchmarking several non-MLPerf LLM models - including DeepSeek R1 and LLaMA 3.3 - on the OpenOrca dataset, using commodity servers equipped with NVIDIA H100 GPUs. Its automation framework enabled rapid switching between models, datasets, and hardware configurations by simply modifying command-line parameters, without requiring any code changes. FlexAI also invested significant effort assembling the Open MLPerf dataset by unifying past MLPerf Inference results (v4.0) and combining them with the latest official submissions and FlexBench data. To enable predictive modeling, FlexAI cleaned the dataset, standardized disparate fields, and engineered new features such as model size and data type. FlexAI has released this cleaned and curated dataset, along with FlexBoard and predictive analytics tools, to help the broader ML, AI, and systems community accelerate benchmarking, evaluation, and optimization efforts. For example, its proof-of-concept prototype allows users to input system costs and predict optimal software/hardware configurations based on model size and data type features. What's next for FlexBench and Open MLPerf. FlexBench and FlexBoard are still in early-stage prototyping. FlexAI invite researchers and practitioners to explore the tools, provide feedback, and collaborate on the following: * Extending FlexBench to support all types of models, datasets, and systems * Expanding the Open MLPerf dataset with FlexBench results from various models across diverse software/hardware configurations from different vendors * Engineering improved features - such as model graphs, tensor shapes, compiler optimizations, accelerator capabilities, and hardware topology - to enhance predictions for previously unseen AI workloads * Extending and improving FlexBoard based on user requirements and feedback * Sharing your specific benchmarking challenges to help guide its priorities Its long-term goal is to enable anyone to run AI models efficiently and cost-effectively, tailored to their available resources, requirements, and constraints. If you're interested in its approach or would like to collaborate, reach out to the authors at FCS Labs. References. Get started today. Start building with €100 in free credits for first-time users.
French startup FlexAI has emerged from stealth with $30M in seed funding to launch an AI training streaming cloud service. The funding round was led by Alpha Intelligence Capital, Elaia Partners, and Heartcore Capital, with participation from Frst Capital, Motier Ventures, Partech, and InstaDeep CEO Karim Beguir. FlexAI aims to simplify AI infrastructure access, enabling developers to build and train AI applications more easily. The company plans to release its first business offering later this year.
At a time when digital sovereignty is more critical than ever, FlexAI has launched an integrated platform that goes beyond the promise of "EU cloud regions".
Flex, a Korean HR platform, secured a ₩10 billion Series B-1 funding from Han River Partners, valuing the company at ₩500 billion. Established in 2019, Flex plans to use the funds to launch AI-driven services, enhance its AI engineering, sales, and marketing teams, and recruit leadership talent. The company aims for Series C funding in 1-2 years, potentially doubling its valuation to become a unicorn. Flex's ARR has surpassed ₩30 billion, and it seeks to pioneer AI SaaS in Korea.
The partnership combines FlexAI's Workload as a Service platform with Tenstorrent's powerful chips to give companies fast, affordable AI infrastructure while significantly reducing the effort to manage the backend.