Full-Time
AI SRE platform for autonomous remediation
$200k - $350k/yr
New York, NY, USA
In Person
In-office role; five days per week on-site in New York (near Madison Square Park).
Traversal provides an AI-powered platform for site reliability engineering and observability. Its AI SRE agent autonomously detects, troubleshoots, and resolves production incidents by analyzing telemetry and performing root-cause analysis to identify underlying causes. It combines large language models with causal machine learning to orchestrate real-time remediation and offers proactive health checks. It can be deployed as a standalone product or as an intelligence layer on existing observability stacks, including on-premise hosting, to serve enterprises like cloud providers and large SaaS firms, with a goal of reducing downtime and moving systems toward self-healing.
Company Size
51-200
Company Stage
Seed
Total Funding
$48M
Headquarters
New York City, New York
Founded
2023
Help us improve and share your feedback! Did you find this helpful?
People at Traversal who can refer or advise you
Health Insurance
Flexible Work Hours
Company Equity
Traversal, an AI lab building agents for enterprise site reliability engineering, has announced six senior leadership hires across go-to-market and engineering within a single month. The company's headcount has grown to over 90, representing a 110 per cent increase in six months. New appointments include Jim Cavanaugh as SVP of Worldwide Sales, Ryan Powers as SVP of Marketing, Patrick Wade as VP of Worldwide Field Engineering, and Maxime Petazzoni as Head of Engineering. The hires bring experience from companies including Cribl, Redis, SignalFx and Splunk. The expansion follows Traversal's recent investment from Amex Ventures and deployment across American Express. A Fortune 100 financial services case study showed 32 per cent reduction in potential mean time to resolution and 82 per cent root cause analysis accuracy.
Traversal, the frontier lab building AI agents for enterprise-grade site reliability engineering (SRE), today announced a strategic investment from Amex Vent...
American Express has partnered with and invested $5 million through Amex Ventures in Traversal, an AI-driven site reliability engineering startup founded by researchers from MIT, Columbia and Cornell. The credit card company will deploy Traversal's platform across its global technology infrastructure. Traversal uses large language models, AI agents and causal machine learning to analyse operational telemetry data across multiple monitoring systems, helping diagnose and resolve technology outages more quickly. The platform aims to automate work traditionally requiring dozens of engineers collaborating in "war rooms" during incidents. The startup has raised approximately $53 million to date. Its technology addresses fragmentation in the observability market by inferring cause-and-effect relationships across different monitoring platforms, moving beyond simple pattern detection to root cause analysis.
Cloudways launches self-healing site reliability solution, powered by Traversal. At a glance. Cloudways, a leading managed cloud hosting platform, partnered with Traversal to transform its customer support and site reliability experience. Powered by Traversal's AI SRE platform, Cloudways Copilot is an end-to-end self-healing solution that enables users to identify issues and remediate them instantly with a single click. This is the first instance of self-serve site reliability as a service. Following strong adoption and positive feedback, Cloudways Copilot entered into general availability in August 2025, rolling out its issue diagnostics and self-healing solution to all 845k+ customer applications. The challenge. Cloudways - recently ranked by CNET as the number one web hosting software for developers - serves as the cloud infrastructure management platform for website hosting for digital agencies, developers, and small businesses across the globe. Like any platform that is mission critical for a diverse customer base with a broad range of technical needs, Cloudways requires a strong, responsive support workflow to ensure reliability at scale. Prior to partnering with Traversal, Cloudways customers facing issues like slow site performance, failing service, or DDoS attacks, would report their problem via chat or a helpdesk ticket, and receive diagnostic commands from a support engineer. Customers would attempt to run those commands themselves and, if unsuccessful, request remote assistance. The process often involved multiple back-and-forths and long delays in resolution due to customers' varying levels of technical expertise. To improve this experience, Cloudways partnered with Traversal to build an AI SRE with the ambitious goal of not just being a copilot for troubleshooting incidents, but an end-to-end autonomous troubleshooting and self-healing tool to over 845k applications hosted on the platform. Its deployment. Traversal began as a pilot with 500 Cloudways WordPress customers. For data privacy, troubleshooting for Cloudways customers required Traversal to access machine-level logs and metrics directly, rather than reading from a centralized observability stack. Traversal AI connected with custom Cloudways endpoints - for example, Sensu for alerts and Ansible for workflows - all via a custom proxy to meet enterprise-grade guardrails, reliability, and security standards. The resulting solution was launched in private preview as Cloudways Copilot, powered by DigitalOcean's proprietary Gradient AI platform. Its capabilities would include ingesting customer context, identifying the root cause of issues, and return recommended next steps for remediation - often within minutes. As confidence in Copilot's root cause identification grew, customers began asking for a way to apply fixes automatically. In response, Traversal Inc. launched a "SmartFix" feature, enabling users to automatically execute recommended remediations directly from the support flow with the click of a button. Cloudways Copilot is now in General Availability and is being rolled out to all Cloudways customer applications. It is currently performing over 1,000 investigations per day, with volume expected to grow to as many as 4,000 investigations per day as rollout completes. Traversal's impact at Cloudways. Cloudways Copilot constantly monitors the web stack, disk, inodes, and host health, detecting issues within seconds - from high-traffic anomalies like bot crawling and DDoS to system-level issues such as disk space exhaustion, inodes full, and service failures. It quickly analyzes the root cause and delivers clear, actionable recommendations, with the option to remediate automatically. This near-instant diagnosis helps recover optimal server performance with minimal effort, saving customers hours of manual troubleshooting. "We partnered with Traversal to build an end-to-end self-healing system - from alert to remediation. With over 95% accuracy, we can for the first time enable self-service reliability for our thousands of customers, instead of hours of frustrating back-and-forth with support - potentially saving millions in downtime and SRE costs." - Suhaib Zaheer, SVP & GM of Managed Hosting, Cloudways "With Copilot monitoring our servers and 47 applications, we identify problems before clients even experience issues - like getting automated insights that pinpoint exactly which applications are causing problems." "Cloudways Copilot & AI is a game-changer for reducing the amount of time spent taking care of your web server. It is the first good implementation of AI I've seen in a web host that actually makes my life as an agency owner easier." "Cloudways Copilot has transformed how we manage 180+ sites, saving our team 15 hours in just the last month. Instead of spending hours debugging, we now get detailed breakdowns that help us quickly resolve problems." Inside a real incident. At 2:07 PM, a WordPress site hosted by a web development company managing hundreds of sites on Cloudways began to slow down. Pages were timing out, CPU usage spiked, and some users saw 502 and 524 errors, but the root cause wasn't immediately clear. Normally, Cloudways Support would step in on behalf of the customer - spending 60 - 90 minutes collecting logs, isolating the issue, and coordinating with engineers. This time, the alert was handled by Traversal's AI SRE, streamlining the response without any manual triage: * 2:08 PM - Traversal began investigating on behalf of the customer. * 2:10 PM - It identified a set of abusive IPs overwhelming the site and outlined the root cause. * 2:12 PM - It proposed a self-healing action: block the malicious IPs and restart affected services, with UI-guided steps and a full remediation summary. * 2:13 PM - With a single click, the issue was resolved - end to end, in under 5 minutes. What would've taken hours was handled autonomously by Traversal, enabling Cloudways to respond to customer issues faster and more reliably - without manual triage or escalation.
To address these issues, Eventbrite partnered with Traversal to cut through this complexity and provide clearer visibility into their complex infrastructure, towards the goal of automating their incident response.