Full-Time

Machine Learning Researcher

Inference

Inference

1-10 employees

Serverless AI model inference across compute.

Compensation Overview

$250k - $350k/yr

+ Equity

San Francisco, CA, USA

In Person

Bay Area candidates: hybrid role with 4 days on-site per week in SF.

Category
AI & Machine Learning (1)
Required Skills
Pytorch
Reinforcement Learning
Requirements
  • 3+ years of experience training AI models using PyTorch
  • Deep understanding of transformer architectures, attention mechanisms, and model internals
  • Hands-on experience with post-training LLMs using SFT, RLHF, DPO, or other alignment techniques
  • Experience with LLM-specific training frameworks (e.g., Hugging Face Transformers, DeepSpeed, Megatron, TRL, or similar)
  • Strong experimental methodology, including ability to design, run, and analyze rigorous experiments
  • Track record of implementing ideas from recent ML papers
  • Experience training on NVIDIA GPUs at scale
  • Strong foundation in ML fundamentals: optimization, loss functions, regularization, generalization
Responsibilities
  • Research and experiment with new model architectures to improve quality, efficiency, or capability
  • Explore methods to decrease inference latency and improve serving efficiency
  • Run experiments with new learning methods, including novel approaches to SFT, RLHF, DPO, and other post-training techniques
  • Perform reinforcement learning research to improve model alignment and capability
  • Develop and improve our distillation pipeline for training high-quality models from frontier teachers
  • Train models for clients and run evaluations to validate research findings in production settings
  • Create robust benchmarks and evaluation frameworks that ensure custom models match or exceed frontier performance
  • Stay current with ML research and identify techniques that can improve our platform
  • Collaborate with applied engineers to bring successful research into production systems
  • Document findings and share knowledge with the team
Desired Qualifications
  • Publications in ML venues
  • Experience with model distillation or knowledge transfer
  • Experience with LLM speed optimization techniques
  • Familiarity with vision encoders, multimodal models, or other modalities
  • Experience with distributed training and infrastructure at scale
  • Contributions to open-source ML projects
  • You don't need to tick every box. Curiosity and the ability to learn quickly matter more.

Inference.net provides a distributed, serverless platform that lets developers run open-source AI models without managing infrastructure. It operates a global network of compute providers and leverages underutilized data center capacity to offer cost-effective LLM inference via a simple API, supporting models like Llama 3.1 8B. Customers are charged based on compute usage, giving a scalable solution for building AI-enabled applications. Unlike traditional clouds, it emphasizes serverless access to high-quality models with cloud-like reliability at a lower cost. The goal is to democratize access to AI technology by removing infrastructure complexity and cost barriers for developers and companies.

Company Size

1-10

Company Stage

Seed

Total Funding

$11.8M

Headquarters

San Francisco, California

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • $11.8M seed from Multicoin Capital and a16z CSX fuels R&D expansion.
  • Teams like GravityAds train GPT-5 quality models at lightning speed.
  • Grants program attracts open-source developers with free compute resources.

What critics are saying

  • OpenAI o1 erodes cost edge with 50% cheaper superior reasoning now.
  • Together AI captures workloads with 2x lower latency on Arm chips.
  • DeepSeek-V3 commoditizes SLMs as users self-host at 10% compute cost.

What makes Inference unique

  • Catalyst platform uses production traffic for self-improving AI models.
  • Aggregates underutilized data center capacity for 90% cost savings.
  • Full-stack LLM lifecycle from monitoring to specialized model deployment.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Inference who can refer or advise you

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Hybrid Work Options

401(k) Company Match

Commuter Benefits

Phone/Internet Stipend

Gym Membership

Wellness Program

Mental Health Support

Stock Options

Performance Bonus

Profit Sharing

Company Equity

Remote Work Options

Sabbatical Leave

Company News

RootData
Oct 15th, 2025
Inference.net raises $11.8M seed funding

Open-source AI provider Inference has completed an $11.8 million seed round financing, led by Multicoin Capital and a16z CSX, with participation from Topology Ventures, Founders, Inc., and angel investors. The funding will enhance Inference's R&D efforts in model and infrastructure performance and improve its capacity to serve more companies.

Inference.net
Oct 14th, 2025
Announcing our $11.8M Series Seed

Announcing its $11.8M Series Seed. Inference is excited to announce that Inference has raised $11.8M in Series Seed funding, led by Multicoin Capital and a16z CSX, with participation from Topology Ventures, Founders, Inc., and an exceptional group of angel investors. Inference.net enables companies to train and deploy custom AI models that outperform general-purpose alternatives at a fraction of the cost. This capital will accelerate its mission to help businesses take control of their AI destiny. A fork in the road. Every company building with AI faces a critical challenge: pay unsustainable prices to OpenAI, Anthropic, and Google for general-purpose models, or compromise on quality with cheaper alternatives. This dependency on frontier labs creates three fundamental risks: First, spiraling costs limit scale. As usage grows from thousands to billions of requests, API costs can consume entire budgets. Second, companies lack control over core business infrastructure, leaving them vulnerable to price changes, model deprecations, and service disruptions. Third, when everyone uses the same models, true differentiation becomes impossible. Companies shouldn't have to choose between quality and cost. They shouldn't be forced to send sensitive customer data to third-party servers. And they shouldn't build their competitive advantage on infrastructure they don't control. Where Inference stand. Over the past year, Inference has trained and deployed custom language models for some of the fastest-growing AI-native companies in the world. Its approach is straightforward: Inference identify the specific, repeatable tasks that businesses run millions of times and train purpose-built models that excel at exactly those tasks. Whether extracting data from documents, captioning images, or classifying content, its models deliver superior results for their specialized domains. The results speak for themselves. Custom models match or exceed frontier model performance while running 2-3x faster and costing up to 90% less. These models, up to 100x smaller than GPT-5-class systems, prove that optimization for specific tasks beats general capability on a cost-to-performance ratio. Specialized models transform the economics of using AI at scale. Companies spending millions annually on API calls reduce costs by up to 90%. Applications previously constrained by latency can now serve real-time use cases. Businesses concerned about data privacy run models on their own infrastructure. Most importantly, companies gain full control of the AI models powering their core products. Beyond economics, custom models provide lasting competitive advantage. When every company has access to the same frontier models, differentiation disappears. Custom models trained on proprietary data and optimized for specific workflows become a moat that competitors cannot replicate. Your AI becomes yours, and yours alone. Moving forward. The next decade will witness two parallel tracks in AI development. Frontier labs will continue pushing the boundaries with massive, general-purpose models for open-ended tasks like coding, creative writing, and complex reasoning. These models will remain expensive but essential for exploratory use cases. Simultaneously, a new ecosystem of specialized models will power the repetitive, high-volume tasks that constitute the majority of business AI usage. Companies will rely on frontier labs for cutting-edge capabilities while owning and operating custom models for core operations. As companies scale from prototypes to production, the cost of relying on frontier labs becomes untenable. Meanwhile, the open-source ecosystem has matured dramatically, and new post-training techniques make it possible to match frontier capabilities with far fewer parameters. This funding enables Inference to expand its research and development efforts into new frontiers of model and infrastructure performance while scaling its ability to serve more companies. Join Inference. The transition from renting to owning intelligence has begun. Inference aim to accelerate this process. If you're spending more than $50,000 per month on closed-source AI providers, Inference can help you cut costs and improve performance in as little as 4 weeks. Book a call with its research team to learn more. Own your model. Scale with confidence. Schedule a call with its research team to learn more about custom training. Inference'll propose a plan that beats your current SLA and unit cost.