Full-Time

Solutions Engineer/Deployment Strategist

Traversal

Traversal

51-200 employees

AI SRE platform for autonomous remediation

Compensation Overview

$150k - $300k/yr

New York, NY, USA

In Person

Fully in-office in New York, five days per week; located near Madison Square Park.

Category
Sales & Solution Engineering (1)
Required Skills
Observability
REST APIs
Requirements
  • 3+ years in a Solutions Engineer, Forward Deployed Engineer, Deployment Strategist, or Solutions Architect role; or relevant engineering experience with strong customer instincts.
  • Proven ability to run technical evaluations end-to-end — building POCs, designing deployment plans, leading demos, and owning the technical workstream of a sales cycle.
  • Strong project and stakeholder management skills, including coordinating across teams and navigating enterprise processes (Security, Legal, Procurement) to drive pilots forward.
  • Ability to engage deeply with technical systems (cloud, APIs, logs/traces) without needing to be a full-time engineer — strong systems intuition and comfort in production environments.
  • Comfortable working in a fast-paced startup environment with shifting priorities.
Responsibilities
  • Lead technical discovery and deliver crisp demos for engineering teams evaluating Traversal.
  • Scope and design pilot evaluations that align with customer environments, workflows, and success criteria.
  • Create deployment playbooks and integration plans that ensure fast, reliable adoption.
  • Build and refine demo environments, technical guides, and internal tools that support both sales and deployment.
  • Collaborate with Product and Engineering to define integration requirements and channel customer insights into roadmap decisions.
  • Drive onboarding and early implementation to secure quick wins and set up long-term success.
  • Stay current on observability, SRE, and AI infrastructure trends to strengthen customer conversations and internal strategy.
Desired Qualifications
  • Familiarity with SRE workflows or observability tools (Datadog, Grafana, New Relic).
  • Experience crafting deployment strategies or “land and expand” motions for infrastructure or AI-native products.
  • Hands-on experience with IaC tools (Terraform) or deployment systems (Kubernetes).
  • Proven ability to create technical enablement content and scale early technical sales motions.
  • Interest in AI infrastructure and the future of agentic systems.

Traversal provides an AI-powered platform for site reliability engineering and observability. Its AI SRE agent autonomously detects, troubleshoots, and resolves production incidents by analyzing telemetry and performing root-cause analysis to identify underlying causes. It combines large language models with causal machine learning to orchestrate real-time remediation and offers proactive health checks. It can be deployed as a standalone product or as an intelligence layer on existing observability stacks, including on-premise hosting, to serve enterprises like cloud providers and large SaaS firms, with a goal of reducing downtime and moving systems toward self-healing.

Company Size

51-200

Company Stage

Seed

Total Funding

$48M

Headquarters

New York City, New York

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • Observability market grows to $12.6B by 2028 at 15% CAGR.
  • Amex Ventures $5M investment deploys across global infrastructure.
  • Cloudways Copilot achieves 95% accuracy, scales to 4,000 investigations daily.

What critics are saying

  • Datadog's Bits AI consolidates market, erodes Traversal's value in 12 months.
  • Amex builds in-house AI SRE using product knowledge in 18 months.
  • SmartFix false positive causes data loss, triggers SOX fines at Amex.

What makes Traversal unique

  • Traversal combines causal machine learning with LLMs for root cause analysis.
  • Production World Model unifies fragmented telemetry data across observability stacks.
  • AI SRE agent automates incident remediation in minutes for enterprises.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Traversal who can refer or advise you

Benefits

Health Insurance

Flexible Work Hours

Company Equity

Company News

Business Wire
Mar 11th, 2026
Traversal hires 6 senior leaders across GTM and engineering as headcount grows 110% to 90+

Traversal, an AI lab building agents for enterprise site reliability engineering, has announced six senior leadership hires across go-to-market and engineering within a single month. The company's headcount has grown to over 90, representing a 110 per cent increase in six months. New appointments include Jim Cavanaugh as SVP of Worldwide Sales, Ryan Powers as SVP of Marketing, Patrick Wade as VP of Worldwide Field Engineering, and Maxime Petazzoni as Head of Engineering. The hires bring experience from companies including Cribl, Redis, SignalFx and Splunk. The expansion follows Traversal's recent investment from Amex Ventures and deployment across American Express. A Fortune 100 financial services case study showed 32 per cent reduction in potential mean time to resolution and 82 per cent root cause analysis accuracy.

Business Wire
Mar 5th, 2026
Traversal Announces Strategic Investment from Amex Ventures

Traversal, the frontier lab building AI agents for enterprise-grade site reliability engineering (SRE), today announced a strategic investment from Amex Vent...

SiliconANGLE Media
Mar 4th, 2026
American Express invests $5M in AI site reliability startup Traversal

American Express has partnered with and invested $5 million through Amex Ventures in Traversal, an AI-driven site reliability engineering startup founded by researchers from MIT, Columbia and Cornell. The credit card company will deploy Traversal's platform across its global technology infrastructure. Traversal uses large language models, AI agents and causal machine learning to analyse operational telemetry data across multiple monitoring systems, helping diagnose and resolve technology outages more quickly. The platform aims to automate work traditionally requiring dozens of engineers collaborating in "war rooms" during incidents. The startup has raised approximately $53 million to date. Its technology addresses fragmentation in the observability market by inferring cause-and-effect relationships across different monitoring platforms, moving beyond simple pattern detection to root cause analysis.

Traversal
Oct 14th, 2025
Cloudways Launches Self-Healing Site Reliability Solution, Powered by Traversal

Cloudways launches self-healing site reliability solution, powered by Traversal. At a glance. Cloudways, a leading managed cloud hosting platform, partnered with Traversal to transform its customer support and site reliability experience. Powered by Traversal's AI SRE platform, Cloudways Copilot is an end-to-end self-healing solution that enables users to identify issues and remediate them instantly with a single click. This is the first instance of self-serve site reliability as a service. Following strong adoption and positive feedback, Cloudways Copilot entered into general availability in August 2025, rolling out its issue diagnostics and self-healing solution to all 845k+ customer applications. The challenge. Cloudways - recently ranked by CNET as the number one web hosting software for developers - serves as the cloud infrastructure management platform for website hosting for digital agencies, developers, and small businesses across the globe. Like any platform that is mission critical for a diverse customer base with a broad range of technical needs, Cloudways requires a strong, responsive support workflow to ensure reliability at scale. Prior to partnering with Traversal, Cloudways customers facing issues like slow site performance, failing service, or DDoS attacks, would report their problem via chat or a helpdesk ticket, and receive diagnostic commands from a support engineer. Customers would attempt to run those commands themselves and, if unsuccessful, request remote assistance. The process often involved multiple back-and-forths and long delays in resolution due to customers' varying levels of technical expertise. To improve this experience, Cloudways partnered with Traversal to build an AI SRE with the ambitious goal of not just being a copilot for troubleshooting incidents, but an end-to-end autonomous troubleshooting and self-healing tool to over 845k applications hosted on the platform. Its deployment. Traversal began as a pilot with 500 Cloudways WordPress customers. For data privacy, troubleshooting for Cloudways customers required Traversal to access machine-level logs and metrics directly, rather than reading from a centralized observability stack. Traversal AI connected with custom Cloudways endpoints - for example, Sensu for alerts and Ansible for workflows - all via a custom proxy to meet enterprise-grade guardrails, reliability, and security standards. The resulting solution was launched in private preview as Cloudways Copilot, powered by DigitalOcean's proprietary Gradient AI platform. Its capabilities would include ingesting customer context, identifying the root cause of issues, and return recommended next steps for remediation - often within minutes. As confidence in Copilot's root cause identification grew, customers began asking for a way to apply fixes automatically. In response, Traversal Inc. launched a "SmartFix" feature, enabling users to automatically execute recommended remediations directly from the support flow with the click of a button. Cloudways Copilot is now in General Availability and is being rolled out to all Cloudways customer applications. It is currently performing over 1,000 investigations per day, with volume expected to grow to as many as 4,000 investigations per day as rollout completes. Traversal's impact at Cloudways. Cloudways Copilot constantly monitors the web stack, disk, inodes, and host health, detecting issues within seconds - from high-traffic anomalies like bot crawling and DDoS to system-level issues such as disk space exhaustion, inodes full, and service failures. It quickly analyzes the root cause and delivers clear, actionable recommendations, with the option to remediate automatically. This near-instant diagnosis helps recover optimal server performance with minimal effort, saving customers hours of manual troubleshooting. "We partnered with Traversal to build an end-to-end self-healing system - from alert to remediation. With over 95% accuracy, we can for the first time enable self-service reliability for our thousands of customers, instead of hours of frustrating back-and-forth with support - potentially saving millions in downtime and SRE costs." - Suhaib Zaheer, SVP & GM of Managed Hosting, Cloudways "With Copilot monitoring our servers and 47 applications, we identify problems before clients even experience issues - like getting automated insights that pinpoint exactly which applications are causing problems." "Cloudways Copilot & AI is a game-changer for reducing the amount of time spent taking care of your web server. It is the first good implementation of AI I've seen in a web host that actually makes my life as an agency owner easier." "Cloudways Copilot has transformed how we manage 180+ sites, saving our team 15 hours in just the last month. Instead of spending hours debugging, we now get detailed breakdowns that help us quickly resolve problems." Inside a real incident. At 2:07 PM, a WordPress site hosted by a web development company managing hundreds of sites on Cloudways began to slow down. Pages were timing out, CPU usage spiked, and some users saw 502 and 524 errors, but the root cause wasn't immediately clear. Normally, Cloudways Support would step in on behalf of the customer - spending 60 - 90 minutes collecting logs, isolating the issue, and coordinating with engineers. This time, the alert was handled by Traversal's AI SRE, streamlining the response without any manual triage: * 2:08 PM - Traversal began investigating on behalf of the customer. * 2:10 PM - It identified a set of abusive IPs overwhelming the site and outlined the root cause. * 2:12 PM - It proposed a self-healing action: block the malicious IPs and restart affected services, with UI-guided steps and a full remediation summary. * 2:13 PM - With a single click, the issue was resolved - end to end, in under 5 minutes. What would've taken hours was handled autonomously by Traversal, enabling Cloudways to respond to customer issues faster and more reliably - without manual triage or escalation.

Traversal
Oct 7th, 2025
Eventbrite Turns to Traversal's AI SRE to Overcome Complexity of Legacy Systems

To address these issues, Eventbrite partnered with Traversal to cut through this complexity and provide clearer visibility into their complex infrastructure, towards the goal of automating their incident response.