Full-Time

Software Engineer

Site Reliability Engineer

Updated on 4/23/2025

Fireworks AI

Fireworks AI

51-200 employees

AI inference platform for machine learning models

Compensation Overview

$160k - $190k/yr

Mid

Company Does Not Provide H1B Sponsorship

San Carlos, CA, USA + 1 more

More locations: New York, NY, USA

Candidates can work onsite at either Redwood City or New York City.

Category
DevOps & Infrastructure
Site Reliability Engineering
Software Engineering
Required Skills
Bash
Kubernetes
Microsoft Azure
Python
Grafana
Machine Learning
AWS
Go
Prometheus
Terraform
Development Operations (DevOps)
Google Cloud Platform
Requirements
  • 3+ years in SRE/PE/DevOps roles with production-grade Kubernetes experience.
  • Proficiency in cloud networking (AWS/GCP/Azure VPCs, firewalls, DNS) and service monitoring (Prometheus, Alertmanager, Grafana).
  • Hands-on experience with incident management and improving system reliability/SLOs.
  • Strong scripting/coding skills (Python/Go/Bash) for automation and tooling.
  • Familiarity with object storage (S3, GCS) and data pipeline integration.
Responsibilities
  • Drive initiatives to reduce incident response time through improved monitoring, alerting, and automated remediation.
  • Build self-healing systems and playbooks for common failure scenarios.
  • Lead blameless post-mortems and implement preventative measures.
  • Manage and optimize GPU-enabled Kubernetes clusters for AI/ML workloads, focusing on cost-performance efficiency, auto-scaling, and resource utilization.
  • Debug performance bottlenecks in distributed systems (e.g., network, storage, GPU scheduling).
  • Strengthen service health by refining cloud networking stacks (VPCs, load balancers, service meshes) and ensuring low-latency communication.
  • Design fault-tolerant architectures to minimize downtime.
  • Enhance service monitoring with tools like Prometheus, Grafana, and custom metrics pipelines.
  • Implement predictive analytics to proactively address system health risks.
  • Build automation for cluster provisioning, scaling, and recovery using Terraform, Argo, and CI/CD pipelines.
  • Develop tools to streamline operational workflows (e.g., automated rollbacks, canary deployments).
Desired Qualifications
  • Experience with GPU clusters (NVIDIA GPUs, MIG, CUDA) and AI/ML workloads.
  • Knowledge of auto-scaling technologies (K8s HPA/VPA) and auto-remediation frameworks.
  • Expertise in service meshes (Istio)

Fireworks AI provides a platform for running and customizing machine learning models, catering to tech companies, research institutions, and enterprises aiming to incorporate AI into their workflows. The platform allows users to deploy models, fine-tune them, and perform inference, making it easier for clients to utilize AI technology. Unlike many competitors, Fireworks AI offers a subscription-based service that includes access to open-source models and additional premium features, which can be tailored to specific needs. The company's goal is to enhance AI adoption in production environments, supported by recent funding to develop more advanced AI systems and expand its team.

Company Size

51-200

Company Stage

Series B

Total Funding

$77M

Headquarters

Redwood City, California

Founded

2022

Simplify Jobs

Simplify's Take

What believers are saying

  • Recent $52M funding will accelerate development of compound AI systems and platform enhancements.
  • Growing interest in compound AI systems aligns with Fireworks AI's strategic direction.
  • API-based solutions increase accessibility, driving further adoption of Fireworks AI's platform.

What critics are saying

  • Deep Cogito's open-source models may reduce demand for Fireworks' subscription services.
  • DeepSeek R1's lower costs could pressure Fireworks AI to reduce prices.
  • Poe's monetization model may attract developers away from Fireworks AI's platform.

What makes Fireworks AI unique

  • Fireworks AI specializes in AI inference and model customization for diverse clients.
  • The platform offers subscription-based access to open-source models and fine-tuning services.
  • Fireworks AI's API supports rapid deployment and integration of generative AI capabilities.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Professional Development Budget

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

-4%

2 year growth

-4%
VentureBeat
Apr 8th, 2025
New Open Source Ai Company Deep Cogito Releases First Models And They’Re Already Topping The Charts

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn MoreDeep Cogito, a new AI research startup based in San Francisco, officially emerged from stealth today with Cogito v1, a new line of open source large language models (LLMs) fine-tuned from Meta’s Llama 3.2 and equipped with hybrid reasoning capabilities — the ability to answer quickly and immediately, or “self-reflect” like OpenAI’s “o” series and DeepSeek R1.The company aims to push the boundaries of AI beyond current human-overseer limitations by enabling models to iteratively refine and internalize their own improved reasoning strategies. It’s ultimately on a quest toward developing superintelligence — AI smarter than all humans in all domains — yet the company says that “All models we create will be open sourced.”Deep Cogito’s CEO and co-founder Drishan Arora — a former Senior Software Engineer at Google who says he led the large language model (LLM) modeling for Google’s generative search product —also said in a post on X they are “the strongest open models at their scale – including those from LLaMA, DeepSeek, and Qwen.”The initial model lineup includes five base sizes: 3 billion, 8 billion, 14 billion, 32 billion, and 70 billion parameters, available now on AI code sharing community Hugging Face, Ollama and through application programming interfaces (API) on Fireworks and Together AI.They’re available under the Llama licensing terms which allows for commercial usage — so third-party enterprises could put them to work in paid products — up to 700 million monthly users, at which point they need to obtain a paid license from Meta.The company plans to release even larger models — up to 671 billion parameters — in the coming months.Arora describes the company’s training approach, iterated distillation and amplification (IDA), as a novel alternative to traditional reinforcement learning from human feedback (RLHF) or teacher-model distillation.The core idea behind IDA is to allocate more compute for a model to generate improved solutions, then distill the improved reasoning process into the model’s own parameters — effectively creating a feedback loop for capability growth. Arora likens this approach to Google AlphaGo’s self-play strategy, applied to natural language.The Cogito models are open-source and available for download via Hugging Face and Ollama, or through APIs provided by Fireworks AI and Together AI. Each model supports both a standard mode for direct answers and a reasoning mode, where the model reflects internally before responding.Benchmarks and evaluationsThe company shared a broad set of evaluation results comparing Cogito models to open-source peers across general knowledge, mathematical reasoning, and multilingual tasks. Highlights include:Cogito 3B (Standard) outperforms LLaMA 3.2 3B on MMLU by 6.7 percentage points (65.4% vs

The Bridge
Jan 30th, 2025
Deepseek-R1は企業にとってなぜ朗報なのか——Aiアプリをより安価に、構築しやすく、より革新的に

画像クレジット: VentureBeat with Ideogram. DeepSeek-R1 推論モデルのリリースは、主要な AI 株の突然の大量売却に最も顕著に表れているように、テクノロジー業界に衝撃を与えた。DeepSeek がはるかに少ないコストで o1 の競合モデルを開発できたと報告されており、OpenAI や Anthropic のような潤沢な資金を持つ AI ラボの優位性はもはやそれほど確固たるものには見えない。. 一部の AI ラボは現在危機モードにあるが、企業セクターに関する限り、これはほとんど良いニュースである。

VentureBeat
Jan 27th, 2025
Deepseek-R1 Is A Boon For Enterprises — Making Ai Apps Cheaper, Easier To Build, And More Innovative

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. The release of the DeepSeek R1 reasoning model has caused shockwaves across the tech industry, with the most obvious sign being the sudden sell-off of major AI stocks. The advantage of well-funded AI labs such as OpenAI and Anthropic no longer seems very solid, as DeepSeek has reportedly been able to develop their o1 competitor at a fraction of the cost.While some AI labs are currently in crisis mode, as far as the enterprise sector is concerned, it’s mostly good news. Cheaper applications, more applicationsAs we had said here before, one of the trends worth watching in 2025 is the continued drop in the cost of using AI models. Enterprises should experiment and build prototypes with the latest AI models regardless of the price, knowing that the continued price reduction will enable them to eventually deploy their applications at scale. That trendline just saw a huge step change. OpenAI o1 costs $60 per million output tokens versus $2.19 per million for DeepSeek R1

Fireworks
Jul 12th, 2024
Fireworks AI Raises $52M Series B to Lead Industry Shift to Compound AI Systems

Fireworks AI raises $52M led by Sequoia, boosting its valuation to $552M. The funds will speed up the development of compound AI systems, team growth, and platform enhancements to increase AI adoption in production.

PYMNTS
Jul 11th, 2024
Fireworks AI Valued at $552 Million After New Funding Round

Fireworks' funding round - which included participation by Nvidia - is happening as tech firms continue to invest in the AI sector and in the technology itself.