Full-Time

Account Development Representative

Posted on 12/15/2025

Anyscale

Anyscale

501-1,000 employees

Scales AI workloads with Ray platform

Compensation Overview

$120k/yr

San Francisco, CA, USA

In Person

Category
Sales & Account Management (1)
Requirements
  • You have a minimum of 3 years of work experience, with at least 1 year in a technical sales capacity.
Responsibilities
  • Prospecting: Identify and engage with potential customers who could benefit from Anyscale's self-service product, leveraging your technical understanding to establish rapport and articulate value
  • Lead Generation: Generate qualified leads through various channels, including outbound prospecting, inbound inquiries, and strategic partnerships
  • Pipeline Nurturing: Nurture leads through the sales funnel, providing them with the information and support they need to progress towards conversion
  • Collaboration: Work closely with the sales team to ensure a seamless handover of qualified opportunities, facilitating smooth transitions and maximizing conversion rates

Anyscale helps enterprises run AI workloads at scale by providing a software platform built around the Ray open-source framework. The core product enables users to deploy, manage, and optimize distributed AI tasks—from training to inference—across large clusters, with features that handle scaling, fault tolerance, and resource management. The platform is delivered as a software-as-a-service, so customers pay a subscription to access tools for running Generative AI, large language models, computer vision, and other ML workloads efficiently and reliably. Unlike others who focus on individual components, Anyscale combines Ray’s distributed execution with enterprise-ready management, monitoring, and optimization to productionize AI applications. The company’s goal is to help organizations deploy AI workloads faster, at scale, with predictable performance and cost efficiency.

Company Size

501-1,000

Company Stage

Series C

Total Funding

$259.6M

Headquarters

San Francisco, California

Founded

2019

Simplify Jobs

Simplify's Take

What believers are saying

  • Ray Serve cuts P99 latency 88% via HAProxy, gRPC in Ray 2.55+.
  • Ray Data with NVIDIA cuDF slashes multimodal processing costs 80% on RTX PRO 4500 Blackwell.
  • Rack-aware scheduling optimizes NVIDIA GB300 NVL72 for 100-500+ GPU training.

What critics are saying

  • OpenAI, Uber internalize Ray, bypassing Anyscale SaaS fees within 12 months.
  • Modal commoditizes Ray scaling, eroding Anyscale market share in 12-18 months.
  • AWS, Google Cloud natively integrate cuDF-Ray, eliminating platform lock-in by 2028.

What makes Anyscale unique

  • Anyscale, Ray creators, delivers unified platform scaling AI from laptops to data centers.
  • Ray powers OpenAI, Uber, Shopify, Amazon for production ML platforms.
  • SkyRL enables vision-language reinforcement learning for robotics and agentic tasks.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Medical, Dental, and Vision insurance

401K retirement savings

Flexible time off

FSA and Commuter benefits

Parental and family leave

Office & phone plan reimbursement

Growth & Insights and Company News

Headcount

6 month growth

27%

1 year growth

-6%

2 year growth

0%
CryptoKorner
Mar 24th, 2026
Ray Serve upgrade delivers 88% lower latency for AI inference at scale.

Ray Serve upgrade delivers 88% lower latency for AI inference at scale. 5 hours ago Anyscale has shipped substantial performance upgrades to Ray Serve that slash P99 latency by up to 88% and boost throughput by 11.1x for large language model inference workloads. The improvements, available in Ray 2.55+, address scaling bottlenecks that have plagued enterprise AI deployments running latency-sensitive applications. The upgrades center on two architectural changes: HAProxy integration for ingress traffic and direct gRPC communication between deployment replicas. Both bypass Python-based components that previously created chokepoints under heavy load. What the numbers show. In benchmark testing of a deep learning recommendation model pipeline, the optimized configuration pushed throughput from 490 to 1,573 queries per second while cutting P99 latency by 75%. At 400 concurrent users, the performance gap widened dramatically as Ray Serve's default Python proxy saturated while HAProxy continued scaling. For LLM inference specifically, the results proved even more striking. Running GPT-class models on H100 GPUs at 256 concurrent users per replica, throughput scaled linearly with replica count when using HAProxy - something the default configuration couldn't achieve as the Python process hit its ceiling. Streaming workloads saw 8.9x throughput improvements, while unary request patterns hit the full 11.1x gain. Technical architecture shift. The core problem: Ray Serve's default proxy runs on Python's asyncio, which struggles at high concurrency. HAProxy, written in C and battle-tested across production systems globally, handles the same traffic with significantly less overhead. The second optimization targets inter-deployment communication. Previously, when one deployment called another, Ray Serve routed everything through Ray Core's actor task system - useful for complex orchestration but overkill for simple request-response patterns. The new gRPC option establishes direct channels between replica actors, serializing with protobuf instead of going through Ray's object store. Benchmarks show gRPC alone delivers 1.5x throughput improvement for unary calls and 2.4x for streaming at equivalent latency targets. Enterprise implications. These aren't academic improvements. Companies running recommendation systems, real-time fraud detection, or customer-facing LLM applications have consistently hit Ray Serve's scaling limits. The partnership with Google Kubernetes Engine that drove these optimizations suggests enterprise demand was substantial enough to prioritize the work. A single environment variable - RAY_SERVE_USE_GRPC_BY_DEFAULT - enables the gRPC transport. HAProxy activation requires cluster-level configuration but integrates with existing Kubernetes deployments. Anyscale is working toward making both optimizations the default for all inter-deployment communication, with an RFC currently under discussion. For teams already running Ray Serve in production, the upgrade path is straightforward: update to Ray 2.55+ and flip the appropriate flags. The benchmark code is publicly available on GitHub for teams wanting to validate performance gains against their specific workloads before deploying.

PR Newswire
Mar 16th, 2026
Anyscale cuts multimodal AI Data Processing costs by 80% with NVIDIA RTX PRO 4500 Blackwell.

Anyscale cuts multimodal AI Data Processing costs by 80% with NVIDIA RTX PRO 4500 Blackwell. Mar 16, 2026, 16:30 ET SAN FRANCISCO, March 16, 2026 /PRNewswire/ - Anyscale, founded by the creators of Ray, today announced upcoming new capabilities in Ray and the Anyscale platform designed to help teams build and deploy AI workloads at production scale. As more teams seek to build differentiated AI, whether fine-tuning visual-language-action models (VLAs) in robotics or scaling enterprise document processing for RAG and search, transforming complex data modalities such as images, video and documents into AI-ready datasets remains a critical bottleneck in both building and deploying models in production. To unlock new levels of ROI on AI investments, today we are announcing the integration of Ray Data with NVIDIA cuDF. This integration enables GPU-native multimodal data processing, enabling 80% lower cost with NVIDIA RTX PRO 4500 Blackwell Server Edition available soon on AWS EC2. In addition, as the industry continues to adopt more complex model development workflows such as large scale reinforcement learning for LLMs, Anyscale is introducing Ray's rack-aware scheduling for NVIDIA GB300 NVL72 clusters, enabling optimal placement of distributed AI workloads to leverage Nvidia's high-speed interconnect technology (NVLink). "AI systems are growing in complexity, from reinforcement learning pipelines that combine simulation, data generation, training, and inference, to multimodal data preparation for RAG and robotics," said Robert Nishihara, Co-Founder of Anyscale. "Ray serves as a unified compute engine across all of these GPU-powered workloads, giving teams programmatic control to place workloads on the hardware best suited for the job, whether that's NVIDIA RTX PRO 4500 Blackwell for data preparation or NVIDIA GB300 NLV72 for large training runs." These advancements to Ray reflect Anyscale's commitment to advancing open-source AI at scale and making it production-ready for every organization. As AI builders expand training, fine-tuning, and reinforcement learning with multimodal data pipelines - where text, documents, images, and video are processed on GPUs - the ability to efficiently orchestrate AI infrastructure at scale is becoming mission-critical to accelerate end-to-end experimentation. Multimodal Data Processing with cuDF in Ray Data Modern AI pipelines are no longer training-only workloads. Preparing text, images, video, and multimodal embeddings increasingly relies on GPUs, as these steps often use AI models directly. As demand for multimodal data pipelines grows - from processing documents with tables for retrieval and search applications, to preparing logs and images to fine-tune a visual language models (VLMs), to continuously analyzing user activity as part of reinforcement learning systems - inefficient orchestration can quickly limit performance. To support this shift, Ray is expanding GPU-native data processing capabilities by adding support for NVIDIA cuDF within Ray Data. Ray Data simplifies distributed multimodal data processing and batch model inference across heterogeneous (CPU and GPU) clusters. With cuDF integration, teams can run GPU-accelerated structured data processing collocated with training on GPU clusters, including the new GB300. On initial large scale data deduplication tasks, Ray Data's new capabilities are able to reduce cost by 80% on RTX PRO 4500 Blackwell, compared to equivalent CPU-only pipelines. These new capabilities enable data preparation and training to operate as a unified distributed system rather than separate infrastructure layers, reducing bottlenecks and improving end-to-end throughput. Rack-Aware Scheduling for Large-Scale AI Workloads The NVIDIA GB300 NVL72 platform introduces a new class of AI infrastructure, delivering up to 72 GPUs per rack connected by NVLink ultra-high-bandwidth, low-latency interconnects. While a single rack provides exceptional density, advanced AI workloads routinely scale to 100-500+ GPUs spanning multiple racks. At this scale, clear mapping of workloads into the physical topology directly impacts performance and efficiency. To address this challenge, Ray introduces rack-aware scheduling, enabling distributed workloads to be explicitly mapped to the physical topology of NVIDIA GB300 NVL72 clusters. With rack-aware scheduling, developers use simple Python APIs to express placement intent for tightly coupled tasks such as distributed training jobs, gradient synchronization, reinforcement learning learners, and GPU-intensive data preprocessing pipelines. Ray automatically coordinates scheduling to keep communication-intensive workloads within the same rack based on user specifications, maximizing intra-rack bandwidth and reducing costly cross-rack traffic. Advancing GPU Utilization at Multi-Rack Scale Organizations already rely on the Anyscale platform to efficiently operate large NVIDIA H100 and H200 GPU fleets, achieving over 80% GPU utilization in production environments. Rack-aware scheduling extends this foundation to next-generation GB300 systems, helping teams translate rack-scale GPU density into improved workload performance and more effective use of scarce GPU compute. These capabilities complement Anyscale's broader AI workload orchestration features, including: * Priority-aware orchestration for fair sharing of GPU resources across regions or cloud providers. * Fine-grained fractional GPU allocation to pack more into every GPU By extending intelligent orchestration to larger systems, Anyscale ensures hardware innovation directly translates into measurable efficiency gains for AI teams operating at scale. These new platform enhancements build on Anyscale's continued momentum as AI labs, robotics teams, and enterprises standardize on AI-native computing to improve developer velocity, production resilience, and cost efficiency. Rack-aware scheduling and NVIDIA cuDF support on Ray Data will be available in open-source Ray and the API compatible Anyscale Runtime. About Anyscale Anyscale, founded by the creators of Ray, is pioneering the era of AI-native computing. Its platform enables developers and enterprises to easily build, run, and scale AI workloads - from multimodal data processing to training and inference - optimized for modern accelerators. With Anyscale, AI teams get the fastest, most reliable Ray experience to power the next generation of AI applications and platforms. Learn more at www.anyscale.com SOURCE Anyscale

National Original Alliance
Nov 20th, 2025
ShadowRay 2.0 Exploits Unpatched Ray Flaw to Build Self-Spreading GPU Cryptomining Botnet

ShadowRay 2.0 exploits unpatched Ray flaw to build self-spreading GPU cryptomining botnet. Oligo Safety has warned of ongoing assaults exploiting a two-year-old safety flaw within the Ray open-source synthetic intelligence (AI) framework to show contaminated clusters with NVIDIA GPUs right into a self-replicating cryptocurrency mining botnet. The exercise, codenamed ShadowRay 2.0, is an evolution of a previous wave that was noticed between September 2023 and March 2024. The assault, at its core, exploits a important lacking authentication bug (CVE-2023-48022, CVSS rating: 9.8) to take management of prone situations and hijack their computing energy for illicit cryptocurrency mining utilizing XMRig. The vulnerability has remained unpatched attributable to a "long-standing design resolution" that is in step with Ray's improvement finest practices, which requires it to be run in an remoted community and act upon trusted code. The marketing campaign entails submitting malicious jobs, with instructions starting from easy reconnaissance to complicated multi-stage Bash and Python payloads, to an unauthenticated Ray Job Submission API ("/api/jobs/") on uncovered dashboards. The compromised Ray clusters are then utilized in spray and pray assaults to distribute the payloads to different Ray dashboards, making a worm that may basically unfold from one sufferer to a different. The assaults have been discovered to leverage GitLab and GitHub to ship the malware, utilizing names like "ironern440-group" and "thisisforwork440-ops" to create repositories and stash the malicious payloads. Each accounts are now not accessible. Nonetheless, the cybercriminals have responded to takedown efforts by creating a brand new GitHub account, illustrating their tenacity and talent to shortly resume operations. The payloads, in flip, leverage the platform's orchestration capabilities to pivot laterally to non-internet-facing nodes, unfold the malware, create reverse shells to attacker-controlled infrastructure for distant management, and set up persistence by working a cron job each quarter-hour that pulls the most recent model of the malware from GitLab to re-infect the hosts. The menace actors "have turned Ray's reliable orchestration options into instruments for a self-propagating, globally cryptojacking operation, spreading autonomously throughout uncovered Ray clusters," researchers Avi Lumelsky and Gal Elbaz stated. The marketing campaign has probably made use of huge language fashions (LLMs) to create the GitLab payloads. This evaluation is predicated on the malware's "construction, feedback, and error dealing with patterns." The an infection chain entails an specific test to find out if the sufferer is positioned in China, and in that case, serves a region-specific model of the malware. It is also designed to get rid of competitors by scanning working processes for different cryptocurrency miners and terminating them - a tactic extensively adopted by cryptojacking teams to maximise the mining good points from the host. One other notable side of the assaults is using numerous techniques to fly below the radar, together with disguising malicious processes as reliable Linux kernel employee companies and limiting CPU utilization to round 60%. It is believed that the marketing campaign might have been energetic since September 2024. Whereas Ray is supposed to be deployed inside a "managed community surroundings," the findings present that customers are exposing Ray servers to the web, opening a profitable assault floor for dangerous actors and figuring out which Ray dashboard IP addresses are exploitable utilizing the open-source vulnerability detection software work together.sh. Greater than 230,500 Ray servers are publicly accessible. Anyscale, which initially developed Ray, has launched a "Ray Open Ports Checker" software to validate the right configuration of clusters to stop unintended publicity. Different mitigation methods embrace configuring firewall guidelines to restrict unauthorized entry and including authorization on prime of the Ray Dashboard port (8265 by default). "Attackers deployed sockstress, a TCP state exhaustion software, focusing on manufacturing web sites. This means the compromised Ray clusters are being weaponized for denial-of-service assaults, probably towards competing mining swimming pools or different infrastructure," Oligo stated. "This transforms the operation from pure cryptojacking right into a multi-purpose botnet. The flexibility to launch DDoS assaults provides one other monetization vector - attackers can lease out DDoS capability or use it to get rid of competitors. The goal port 3333 is usually utilized by mining swimming pools, suggesting assaults towards rival mining infrastructure."

THN Media Private Limited
Nov 20th, 2025
ShadowRay 2.0 Exploits Unpatched Ray Flaw to Build Self-Spreading GPU Cryptomining Botnet

ShadowRay 2.0 exploits unpatched Ray flaw to build self-spreading GPU cryptomining botnet. Oligo Security has warned of ongoing attacks exploiting a two-year-old security flaw in the Ray open-source artificial intelligence (AI) framework to turn infected clusters with NVIDIA GPUs into a self-replicating cryptocurrency mining botnet. The activity, codenamed ShadowRay 2.0, is an evolution of a prior wave that was observed between September 2023 and March 2024. The attack, at its core, exploits a critical missing authentication bug (CVE-2023-48022, CVSS score: 9.8) to take control of susceptible instances and hijack their computing power for illicit cryptocurrency mining using XMRig. The vulnerability has remained unpatched due to a "long-standing design decision" that's consistent with Ray's development best practices, which requires it to be run in an isolated network and act upon trusted code. The campaign involves submitting malicious jobs, with commands ranging from simple reconnaissance to complex multi-stage Bash and Python payloads, to an unauthenticated Ray Job Submission API ("/api/jobs/") on exposed dashboards. The compromised Ray clusters are then used in spray and pray attacks to distribute the payloads to other Ray dashboards, creating a worm that can essentially spread from one victim to another. The attacks have been found to leverage GitLab and GitHub to deliver the malware, using names like "ironern440-group" and "thisisforwork440-ops" to create repositories and stash the malicious payloads. Both accounts are no longer accessible. However, the cybercriminals have responded to takedown efforts by creating a new GitHub account, illustrating their tenacity and ability to quickly resume operations. The payloads, in turn, leverage the platform's orchestration capabilities to pivot laterally to non-internet-facing nodes, spread the malware, create reverse shells to attacker-controlled infrastructure for remote control, and establish persistence by running a cron job every 15 minutes that pulls the latest version of the malware from GitLab to re-infect the hosts. The threat actors "have turned Ray's legitimate orchestration features into tools for a self-propagating, globally cryptojacking operation, spreading autonomously across exposed Ray clusters," researchers Avi Lumelsky and Gal Elbaz said. The campaign has likely made use of large language models (LLMs) to create the GitLab payloads. This assessment is based on the malware's "structure, comments, and error handling patterns." The infection chain involves an explicit check to determine if the victim is located in China, and if so, serves a region-specific version of the malware. It's also designed to eliminate competition by scanning running processes for other cryptocurrency miners and terminating them - a tactic widely adopted by cryptojacking groups to maximize the mining gains from the host. Another notable aspect of the attacks is the use of various tactics to fly under the radar, including disguising malicious processes as legitimate Linux kernel worker services and limiting CPU usage to around 60%. It's believed that the campaign may have been active since September 2024. While Ray is meant to be deployed within a "controlled network environment," the findings show that users are exposing Ray servers to the internet, opening a lucrative attack surface for bad actors and identifying which Ray dashboard IP addresses are exploitable using the open-source vulnerability detection tool interact.sh. More than 230,500 Ray servers are publicly accessible. Anyscale, which originally developed Ray, has released a "Ray Open Ports Checker" tool to validate the proper configuration of clusters to prevent accidental exposure. Other mitigation strategies include configuring firewall rules to limit unauthorized access and adding authorization on top of the Ray Dashboard port (8265 by default). "Attackers deployed sockstress, a TCP state exhaustion tool, targeting production websites. This suggests the compromised Ray clusters are being weaponized for denial-of-service attacks, possibly against competing mining pools or other infrastructure," Oligo said. "This transforms the operation from pure cryptojacking into a multi-purpose botnet. The ability to launch DDoS attacks adds another monetization vector - attackers can rent out DDoS capacity or use it to eliminate competition. The target port 3333 is commonly used by mining pools, suggesting attacks against rival mining infrastructure."

The New Stack
Oct 22nd, 2025
Ray Comes to the PyTorch Foundation

Ray comes to the PyTorch Foundation. The Ray project will join existing projects like PyTorch itself, the vLLM inference engine and deep learning optimization library DeepSpeed. The PyTorch Foundation, the Linux Foundation-based open source AI organization, today announced that it will become the host of Ray, the popular open source distributed computing framework for scaling AI and Python applications. The Ray project will join existing projects like PyTorch itself, the vLLM inference engine and deep learning optimization library DeepSpeed. Ray was originally incubated in UC Berkeley's RISELab. Robert Nishihara and Philipp Moritz, who were graduate students at the time, launched the project in 2016. Together with their professor (and Databricks co-founder) Ion Stoica, they then decided to found Anyscale to commercialize their work. Since then, Anyscale has raised over $250 million and launched various products around Ray, and like most open source companies, it also offers a hosted platform that combines many of these services into an enterprise-ready platform. The core Ray project itself provides the primitives (called tasks, actors and objects) for building distributed Python-based applications. It's worth noting that the core piece of Ray isn't limited to AI applications but can be used to scale any Python workload. But the Ray project also includes the AI-specific libraries for managing machine learning (ML) data sets and handling distributed training, as well as libraries for tuning and serving models. Ray also includes a library for scaling reinforcement learning workloads. "At Ray, our goal is to make distributed computing as straightforward as writing Python code," said Anyscale co-founder Nishihara. "Joining the PyTorch Foundation helps us stay true to that mission, ensuring Ray continues to be an open, community-driven backbone for developers and their organizations." Since its launch, Anyscale has independently maintained the Ray open source project (though it has occasionally been mislabeled "Apache Ray"). For the PyTorch Foundation, the addition of Ray means it now offers some of the most foundational open source projects for the AI ecosystem. There is PyTorch for model development, vLLM for inference and Ray for distributed execution, the Foundation argues. "By bringing Ray under the PyTorch Foundation umbrella, alongside projects like vLLM and DeepSpeed, we are uniting the critical components needed to build next-generation AI systems. Ray's inclusion strengthens our collective mission to support developers with the tools to efficiently train, serve, and deploy AI models at scale," said Matt White, the GM of AI at the Linux Foundation and executive director of the PyTorch Foundation.

INACTIVE