Full-Time
GPU marketplace enabling cost-effective AI compute
$160k - $320k/yr
San Francisco, CA, USA + 1 more
More locations: Los Angeles, CA, USA
In Person
On-site at SF or LA offices; no remote option stated.
Vast.ai runs a marketplace that connects GPU owners with users who need high-performance computing for AI and machine learning. Users browse available hardware, compare performance using the DLPerf scoring function, and rent compute through interruptible instances and spot auctions to save money. The platform aggregates offerings from individuals, data centers, and large providers, enabling a diverse, competitive marketplace while prioritizing security and regulatory compliance. This approach lets customers choose hardware that matches their needs and budget, rather than sticking to traditional cloud providers. Vast.ai’s goal is to make AI development more accessible and to improve the utilization of existing computing resources, contributing to cost efficiency and sustainability in tech.
Company Size
11-50
Company Stage
N/A
Total Funding
N/A
Headquarters
Los Angeles, California
Founded
2018
Help us improve and share your feedback! Did you find this helpful?
Health Insurance
Dental Insurance
Vision Insurance
Life Insurance
401(k) Company Match
Company Equity
Vast.ai named among fastest growing vendors by ramp and brex. March 5, 2026 By Team Vast
WAN 2.2 vs. LTX-2: which AI video model should you use? Wouldn't it be great if you could just think of a scene and instantly turn it into a video the way it appears in your head? Technology isn't quite there yet, but today's most advanced AI video generation models are getting Vast.ai Inc. closer. Today Vast.ai Inc. is taking a look at WAN 2.2 and LTX-2, two open-source/open-weights models that transform text and images into short-form video. What WAN 2.2 and LTX-2 are - and how they differ. From the outside, WAN 2.2 and LTX-2 are pretty similar tools. They're both open-source/open-weights diffusion-based video generation models designed to turn images or text prompts into short video clips. Their underlying architecture, however, is very different. WAN 2.2: prompt fidelity and Cinematic control. Developed by Alibaba Tongyi Lab, WAN 2.2 is built around a Mixture-of-Experts (MoE) architecture. Instead of using a single neural network to manage the entire denoising process, it employs two specialized "experts": a high-noise expert for overall structure and layout, and a low-noise expert for refining textures and details like lighting and color tone. Switching between these two experts means the model can allocate compute depending on what it needs to do at any given moment - focusing on broader structure first and finer details later. It also boosts efficiency by avoiding unnecessary computation. WAN 2.2 comes in three main variants, each designed for different workflows: * Text-to-Video (T2V): Generates 5-second video clips at 480P to 720P from a text prompt written in plain language. This is a flexible option for scenes where everything needs to be synthesized from scratch. * Image-to-Video (I2V): Begins with a single image and turns it into a short video. It uses automatic prompt derivation to generate video from an image without text input at all, but can also support text prompts for more directed results. * Hybrid: A compact model with 5 billion parameters that handles both text-to-video and image-to-video generation. It delivers high-definition results at up to 720P and 24 FPS, but is designed for users with lower VRAM. The base WAN 2.2 models generate video only, without native audio output. However, there is a specialized speech-to-video version (WAN 2.2 S2V) that transforms static images and audio inputs into synchronized videos. LTX-2: native audio-video generation. Created by Lightricks, LTX-2 is a DiT-based (Diffusion Transformer) audio-video generation model. It produces audio and visuals together in one pass, keeping dialogue, lip movements, and ambient sound aligned coherently. Its architecture is based on latent diffusion, which means the model works in a compressed version of the video first, before converting it into full resolution. This makes it more memory efficient and enables faster iteration, translating to quicker experimentation and lower hardware overhead. LTX-2 can generate up to ~20 seconds of synchronized audio and video, with support for high resolutions and high frame rates depending on configuration and available compute. The model offers fine-grained control options - such as LoRA-based customization and multimodal inputs including text, image, video, and audio - for precise creative direction. This makes LTX-2 a highly flexible model. In short, it supports text-to-video, image-to-video, and native audio-visual generation, along with cross-modal workflows like audio-to-video, text-to-audio, and video-to-audio - all within a single model. Choosing the right model for your workflow. How the two models are designed directly affects what you experience as a user. For instance, WAN 2.2's MoE design prioritizes structured generation and motion consistency. It boasts strong prompt adherence with high-fidelity output and is more likely to preserve scene intent across frames, sticking closely to what you asked for - albeit at the cost of slightly longer generation times. LTX-2's latent diffusion approach emphasizes speed and accessibility. It's faster to iterate with, easier to experiment on, and even offers native audio-video sync. However, it may require more prompt tuning to get exactly what you want. Choose WAN 2.2 if you want: * Cinematic or narrative-style clips where composition and camera motion are critical * Strong prompt fidelity for complex scenes with multiple elements * More deliberate, production-oriented outputs and professional video content Choose LTX-2 if you prefer: * Rapid prototyping of video concepts and creative exploration for lengthier scenes * Visual storytelling or character-driven video with synchronized dialogue or sound * A lighter and more iterative workflow where speed matters more than precision Both models also integrate with ComfyUI, so you can jump right into testing them out with an intuitive node-based visual workflow. Final thoughts. Neither WAN 2.2 nor LTX-2 is objectively superior to the other. The two open-source/open-weights models are designed for different kinds of workflows and creative goals. The best way to get a feel for them is to actually try them out. The good news is that both models run well on high-end consumer GPUs, making them far more accessible than many people might expect. With Vast.ai, it's even easier: you can spin up the right hardware on demand and experiment on your own terms, paying only for the compute you need - and save up to 80% over traditional clouds. Try WAN 2.2 T2V and WAN 2.2 I2V, or LTX-2 (or both!) in its Model Library, and build your own creative pipeline on Vast.ai today.
3D model company VAST has secured a multi-million dollar Pre-A+ funding round, led by the Beijing AI Industry Investment Fund, with participation from Jingya Capital. Previous investors include Oasis Capital, Fortune Capital, Primavera Capital, Inno Angel Fund, and Tsinghua Alumni Seed Fund. VAST has also launched Tripo Studio, the world's first AI-driven one-stop 3D workstation, and plans to release a new algorithm, Tripo 3.0.
VAST, an AI company focused on developing general 3D models, has secured a Pre-A+ funding round worth tens of millions of dollars. The investment was led by the Beijing Artificial Intelligence Industry Investment Fund, with participation from Innoangel Fund.
Berkeley-Founded AI Cloud Pioneer Opens Prime SOMA Office, Tapping Bay Area's Elite Talent Pool for New Product Initiatives as AI talent wars intensifySAN FRANCISCO and LOS ANGELES, June 5, 2025 /PRNewswire/ -- Vast.AI, a premier provider of GPU cloud services for artificial intelligence (AI) and machine learning (ML), today announced its strategic expansion with a new 3,400 sq ft. office in San Francisco. Located at 100 First Street in the heart of the SOMA district, the new hub will focus on developing a new, unannounced product and scaling VAST.AI's industry-leading GPU rental platform