Full-Time

Inference Engineer

Video AI

Posted on 8/29/2025

Cantina

Cantina

51-200 employees

Subscription-based platform for AI-driven social gaming

Compensation Overview

$175k - $225k/yr

London, UK + 2 more

More locations: San Francisco, CA, USA | Berlin, Germany

In Person

Hybrid/remote options available; WFH equipment provided.

Category
AI & Machine Learning (2)
,
Requirements
  • 2+ years of ML engineering experience with focus on model inference and deployment
  • Strong understanding of neural network architectures, particularly diffusion networks, VAEs, and transformer models
  • Experience with video and image models – Understanding of how video/image generation models work, their architectures, and optimization strategies specific to video processing
  • Multi-GPU inference expertise – Experience running model components across multiple GPUs, implementing parallel processing strategies for large models
  • Production model hosting experience – Track record of deploying and maintaining ML models in production environments, including streaming and real-time inference
  • Experience with containerization (Docker), AWS, and cluster computing environments
  • Familiarity with machine learning frameworks (PyTorch, TensorFlow)
  • Experience with inference platforms and model serving solutions
  • Cloud: AWS (S3, DynamoDB), Kubernetes clusters
  • ML Infrastructure: Model serving platforms, Docker
  • Languages: Python
  • Frameworks: PyTorch, TensorFlow
  • Models: Video generation models, diffusion networks, VAEs, transformers
  • Optimization: Multi-GPU inference, real-time processing techniques
Responsibilities
  • Deploy video AI models to production – Take research models and build production-ready inference endpoints with APIs, ensuring efficient operation across cloud infrastructure
  • Maintain and optimize inference systems – Debug complex model serving issues, optimize latency performance, monitor system health, and ensure 99.9% uptime for AI-powered features
  • Implement model optimizations – Work with neural network architectures including diffusion networks, VAEs, and transformers. Apply streaming optimizations and understand video model architectures to implement effective performance improvements
  • Manage inference infrastructure – Leverage containerization with Docker, cloud storage solutions like S3, and cluster computing to build scalable model serving infrastructure
  • Collaborate with research teams – Work closely with AI researchers to understand model requirements, architectural constraints, and optimization opportunities for new video generation models

Cantina is an invitation-only platform that blends social gaming with artificial intelligence in a 24/7 virtual club called The Cantina. Members subscribe to access features that let them add AI bots with distinct personalities, chat, play games, and even generate AI art, with bot behavior adapting to user inputs. The service differentiates itself through a private, personalized, dynamic experience that combines human–AI interaction with user-generated content in a shared space. Its goal is to build a steady, subscription-based platform that appeals to tech-savvy users and brands by delivering ongoing, customized digital interactions.

Company Size

51-200

Company Stage

Series B

Total Funding

$33.3M

Headquarters

New York City, New York

Founded

2014

Simplify Jobs

Simplify's Take

What believers are saying

  • Viral AI avatar creation tools attract content creators and virtual influencer entrepreneurs.
  • Growing interest in AI-driven social gaming expands addressable market beyond traditional social networks.
  • Personalized bot interactions with distinct personalities drive engagement and 30-day retention improvements.

What critics are saying

  • Meta's Instagram Reels AI avatars and generative video tools capture target demographic without switching.
  • OpenAI ChatGPT integration into WhatsApp, iMessage, TikTok displaces Cantina's core AI interaction value.
  • Invite-only model with <500K users risks network-effect death spiral if monthly churn exceeds 5-8%.

What makes Cantina unique

  • Sean Parker-backed invite-only platform for creating realistic AI characters with friends.
  • Subscription model generates recurring revenue from tech-savvy users seeking personalized AI interactions.
  • Real-time and asynchronous multi-user AI environments enable social, connected human-like character interactions.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Paid Vacation

Paid Sick Leave

401(k) Retirement Plan

401(k) Company Match

Parental Leave

Fertility Treatment Support

Company Equity

Home Office Stipend

Flexible Work Hours

Hybrid Work Options

INACTIVE