Full-Time

Member of Technical Staff

Inference

Posted on 6/23/2025

Inflection

Inflection

51-200 employees

Personalized AI assistant for self-improvement

Compensation Overview

$175k - $350k/yr

Palo Alto, CA, USA

In Person

Category
Software Engineering (1)
Required Skills
Rust
CUDA
Pytorch
C/C++
Requirements
  • Have direct experience deploying and optimizing large transformer models for real-time inference across multi-GPU or multi-node environments
  • Are skilled with tools like Triton, TensorRT, TVM, ONNX Runtime, or custom CUDA kernels—and know when to use C++/Rust for critical performance gains
  • Understand the balance between latency, throughput, accuracy, and cost, and make smart choices around quantization, speculative decoding, and caching
  • Have developed or integrated agent-based orchestration systems, RAG pipelines, or memory architectures in production environments
  • Automate at every layer—CI/CD for model artifacts, load testing, canary rollouts, and auto-scaling
  • Communicate clearly with both infrastructure teams and product stakeholders
Responsibilities
  • Design and optimize high-performance inference pipelines using PyTorch, vLLM, Triton, TensorRT, and FSDP/DeepSpeed
  • Integrate agentic runtimes—tool calling, function execution, and multi-step planning—while meeting strict latency requirements
  • Build robust retrieval-augmented generation (RAG) stacks combining vector search, caching, and real-time context packing
  • Develop memory services to support conversation continuity and user personalization at scale
  • Monitor, instrument, and autotune GPU performance, kernel fusion, and batching strategies across clusters of NVIDIA H100 and Intel Gaudi accelerators
  • Partner with training, safety, and product teams to transform research into stable, production-grade systems
  • Contribute upstream to open-source performance libraries and share insights with the community

Inflection.ai builds an AI-powered personal assistant called Pi that runs on iOS and other platforms. Pi interacts with users in natural language to help with journaling, planning, learning new things, and providing emotional support, acting as a trusted companion. It works by understanding user input and delivering personalized responses and services, ranging from organization and self-improvement tasks to explanations of ideas in simple terms. The approach centers on personalization and emotional intelligence to go beyond task-only helpers, offering a friend-like experience. The business model is likely freemium: core features are free with premium, personalized capabilities available through subscriptions. The goal is to help people improve their personal and professional lives through organized routines, ongoing learning, and emotional support while growing a broad user base and monetizing advanced features.

Company Size

51-200

Company Stage

Acquired

Total Funding

$2.2B

Headquarters

Palo Alto, California

Founded

2022

Simplify Jobs

Simplify's Take

What believers are saying

  • Intel partnership launches Gaudi 3-powered LLM appliance in Q1 2025.
  • Acquisitions of BoostKPI, Jelled.ai, and Boundaryless accelerate enterprise RPA.
  • UiPath integration enhances security-focused automation for large enterprises.

What critics are saying

  • Gaudi 3 trails Nvidia H100 by 40-50%, eroding edge against rivals now.
  • Pi loses users to GPT-4o and Claude 3.5, slashing freemium conversions.
  • No new models under Sean White obsoletes Inflection 3.0 by mid-2025.

What makes Inflection unique

  • Inflection AI pivots to enterprise AI with Agentic Workflows for trusted automation.
  • Proprietary Inflection 3.0 LLM fine-tuned on client data via RLHF for alignment.
  • Public benefit corporation emphasizes ethical, human-centered AI development.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

401(k) Company Match

Unlimited Paid Time Off

Parental Leave

Growth & Insights and Company News

Headcount

6 month growth

-1%

1 year growth

-1%

2 year growth

-45%
Semiconductors Insight
Jul 22nd, 2025
Fear of Losing Search Led Google to Bury Lambda, Says Mustafa Suleyman, Former VP of AI

With massive funding and access to significant compute infrastructure, Inflection launched Pi (Personal Intelligence), a direct evolution of the ideas first manifested in Lambda.

Data Phoenix
Dec 1st, 2024
Inflection AI recently acquired three AI-focused startups to build its enterprise platform

On Tuesday, Inflection announced it has acquired two more startups, BoostKPI and Jelled.ai, to strengthen two key aspects in its enterprise platform: data and communications.

Business Wire
Nov 27th, 2024
Inflection AI Deepens Commitment to Enterprise AI With Acquisition of BoostKPI and Jelled.ai

Inflection AI today announced two new acquisitions to deepen its best-in-class AI capabilities for enterprises. Inflection for Enterprise launched ear

Swipeline
Nov 26th, 2024
Inflection AI Acquires Two AI Startups

Inflection AI, under new CEO Sean White, shifted its focus to enterprise AI, acquiring startups Jelled.AI and Boost.KPI. Originally founded in 2022, Inflection AI gained attention with its AI chatbot "Pi." In July, it announced $1.3 billion in funding. Co-founders, including Mustafa Suleyman, joined Microsoft's AI efforts in a $650 million deal. White stated the company will no longer compete in developing new AI models but will focus on providing AI services to corporate clients.

AI News
Oct 24th, 2024
Inflection's Agentic Workflows Bring Trust and Action to Enterprise AI

In line with this vision, Inflection has introduced Agentic Workflows into its enterprise offering.

INACTIVE