Full-Time

Cloud Security Engineer

Traversal

Traversal

51-200 employees

AI SRE platform for autonomous remediation

Compensation Overview

$200k - $350k/yr

+ Equity

New York, NY, USA

In Person

In-office role; five days per week on-site in New York (near Madison Square Park).

Category
IT & Security (1)
Required Skills
Kubernetes
SOC 2
AWS
Terraform
Requirements
  • 7+ years of hands-on security engineering with meaningful ownership of application or cloud security in production environments
  • You implement, not just advise — comfortable writing Terraform, configuring IAM policies, managing cloud environments and controls, and hardening Kubernetes clusters directly
  • Strong application security fundamentals — OWASP top 10 as a mental model, not a checklist; can apply it to novel and evolving attack surfaces
  • Experience driving cloud infrastructure changes to improve security posture and seeing them through to implementation
  • Has worked at a product company where customer data security was a core responsibility
  • Startup experience — comfortable operating without a fully built security program around you.
Responsibilities
  • Own Traversal's infrastructure and application security posture end-to-end — assess where we are, define what good looks like, and implement the changes yourself
  • Identify and remediate vulnerabilities across our cloud infrastructure, application layer, and customer-facing integrations — through code, not just recommendations
  • Implement and maintain IAM policies, S3 configurations, Kubernetes security controls, and Terraform-managed infrastructure changes
  • Build and enforce security standards for new features, API integrations, and customer environment access
  • Own secrets management, JIT access, and identity architecture
  • Drive security improvements across engineering — working directly with platform, backend, and AI teams to get changes shipped
  • Secure our AI and agent layer — Traversal's agents ingest sensitive production observability data; you'll define what secure looks like in that context
  • Support enterprise customer security reviews and compliance requirements (SOC 2 and beyond)
Desired Qualifications
  • Can walk through a specific cloud infrastructure change they drove to improve security posture: what they found, what they changed, how they got it shipped
  • Has been the primary security owner at a company, or close to it
  • Has thought seriously about what securing an AI or agent system looks like

Traversal provides an AI-powered platform for site reliability engineering and observability. Its AI SRE agent autonomously detects, troubleshoots, and resolves production incidents by analyzing telemetry and performing root-cause analysis to identify underlying causes. It combines large language models with causal machine learning to orchestrate real-time remediation and offers proactive health checks. It can be deployed as a standalone product or as an intelligence layer on existing observability stacks, including on-premise hosting, to serve enterprises like cloud providers and large SaaS firms, with a goal of reducing downtime and moving systems toward self-healing.

Company Size

51-200

Company Stage

Seed

Total Funding

$48M

Headquarters

New York City, New York

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • Observability market grows to $12.6B by 2028 at 15% CAGR.
  • Amex Ventures $5M investment deploys across global infrastructure.
  • Cloudways Copilot achieves 95% accuracy, scales to 4,000 investigations daily.

What critics are saying

  • Datadog's Bits AI consolidates market, erodes Traversal's value in 12 months.
  • Amex builds in-house AI SRE using product knowledge in 18 months.
  • SmartFix false positive causes data loss, triggers SOX fines at Amex.

What makes Traversal unique

  • Traversal combines causal machine learning with LLMs for root cause analysis.
  • Production World Model unifies fragmented telemetry data across observability stacks.
  • AI SRE agent automates incident remediation in minutes for enterprises.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Traversal who can refer or advise you

Benefits

Health Insurance

Flexible Work Hours

Company Equity

Company News

Business Wire
Mar 11th, 2026
Traversal hires 6 senior leaders across GTM and engineering as headcount grows 110% to 90+

Traversal, an AI lab building agents for enterprise site reliability engineering, has announced six senior leadership hires across go-to-market and engineering within a single month. The company's headcount has grown to over 90, representing a 110 per cent increase in six months. New appointments include Jim Cavanaugh as SVP of Worldwide Sales, Ryan Powers as SVP of Marketing, Patrick Wade as VP of Worldwide Field Engineering, and Maxime Petazzoni as Head of Engineering. The hires bring experience from companies including Cribl, Redis, SignalFx and Splunk. The expansion follows Traversal's recent investment from Amex Ventures and deployment across American Express. A Fortune 100 financial services case study showed 32 per cent reduction in potential mean time to resolution and 82 per cent root cause analysis accuracy.

Business Wire
Mar 5th, 2026
Traversal Announces Strategic Investment from Amex Ventures

Traversal, the frontier lab building AI agents for enterprise-grade site reliability engineering (SRE), today announced a strategic investment from Amex Vent...

SiliconANGLE Media
Mar 4th, 2026
American Express invests $5M in AI site reliability startup Traversal

American Express has partnered with and invested $5 million through Amex Ventures in Traversal, an AI-driven site reliability engineering startup founded by researchers from MIT, Columbia and Cornell. The credit card company will deploy Traversal's platform across its global technology infrastructure. Traversal uses large language models, AI agents and causal machine learning to analyse operational telemetry data across multiple monitoring systems, helping diagnose and resolve technology outages more quickly. The platform aims to automate work traditionally requiring dozens of engineers collaborating in "war rooms" during incidents. The startup has raised approximately $53 million to date. Its technology addresses fragmentation in the observability market by inferring cause-and-effect relationships across different monitoring platforms, moving beyond simple pattern detection to root cause analysis.

Traversal
Oct 14th, 2025
Cloudways Launches Self-Healing Site Reliability Solution, Powered by Traversal

Cloudways launches self-healing site reliability solution, powered by Traversal. At a glance. Cloudways, a leading managed cloud hosting platform, partnered with Traversal to transform its customer support and site reliability experience. Powered by Traversal's AI SRE platform, Cloudways Copilot is an end-to-end self-healing solution that enables users to identify issues and remediate them instantly with a single click. This is the first instance of self-serve site reliability as a service. Following strong adoption and positive feedback, Cloudways Copilot entered into general availability in August 2025, rolling out its issue diagnostics and self-healing solution to all 845k+ customer applications. The challenge. Cloudways - recently ranked by CNET as the number one web hosting software for developers - serves as the cloud infrastructure management platform for website hosting for digital agencies, developers, and small businesses across the globe. Like any platform that is mission critical for a diverse customer base with a broad range of technical needs, Cloudways requires a strong, responsive support workflow to ensure reliability at scale. Prior to partnering with Traversal, Cloudways customers facing issues like slow site performance, failing service, or DDoS attacks, would report their problem via chat or a helpdesk ticket, and receive diagnostic commands from a support engineer. Customers would attempt to run those commands themselves and, if unsuccessful, request remote assistance. The process often involved multiple back-and-forths and long delays in resolution due to customers' varying levels of technical expertise. To improve this experience, Cloudways partnered with Traversal to build an AI SRE with the ambitious goal of not just being a copilot for troubleshooting incidents, but an end-to-end autonomous troubleshooting and self-healing tool to over 845k applications hosted on the platform. Its deployment. Traversal began as a pilot with 500 Cloudways WordPress customers. For data privacy, troubleshooting for Cloudways customers required Traversal to access machine-level logs and metrics directly, rather than reading from a centralized observability stack. Traversal AI connected with custom Cloudways endpoints - for example, Sensu for alerts and Ansible for workflows - all via a custom proxy to meet enterprise-grade guardrails, reliability, and security standards. The resulting solution was launched in private preview as Cloudways Copilot, powered by DigitalOcean's proprietary Gradient AI platform. Its capabilities would include ingesting customer context, identifying the root cause of issues, and return recommended next steps for remediation - often within minutes. As confidence in Copilot's root cause identification grew, customers began asking for a way to apply fixes automatically. In response, Traversal Inc. launched a "SmartFix" feature, enabling users to automatically execute recommended remediations directly from the support flow with the click of a button. Cloudways Copilot is now in General Availability and is being rolled out to all Cloudways customer applications. It is currently performing over 1,000 investigations per day, with volume expected to grow to as many as 4,000 investigations per day as rollout completes. Traversal's impact at Cloudways. Cloudways Copilot constantly monitors the web stack, disk, inodes, and host health, detecting issues within seconds - from high-traffic anomalies like bot crawling and DDoS to system-level issues such as disk space exhaustion, inodes full, and service failures. It quickly analyzes the root cause and delivers clear, actionable recommendations, with the option to remediate automatically. This near-instant diagnosis helps recover optimal server performance with minimal effort, saving customers hours of manual troubleshooting. "We partnered with Traversal to build an end-to-end self-healing system - from alert to remediation. With over 95% accuracy, we can for the first time enable self-service reliability for our thousands of customers, instead of hours of frustrating back-and-forth with support - potentially saving millions in downtime and SRE costs." - Suhaib Zaheer, SVP & GM of Managed Hosting, Cloudways "With Copilot monitoring our servers and 47 applications, we identify problems before clients even experience issues - like getting automated insights that pinpoint exactly which applications are causing problems." "Cloudways Copilot & AI is a game-changer for reducing the amount of time spent taking care of your web server. It is the first good implementation of AI I've seen in a web host that actually makes my life as an agency owner easier." "Cloudways Copilot has transformed how we manage 180+ sites, saving our team 15 hours in just the last month. Instead of spending hours debugging, we now get detailed breakdowns that help us quickly resolve problems." Inside a real incident. At 2:07 PM, a WordPress site hosted by a web development company managing hundreds of sites on Cloudways began to slow down. Pages were timing out, CPU usage spiked, and some users saw 502 and 524 errors, but the root cause wasn't immediately clear. Normally, Cloudways Support would step in on behalf of the customer - spending 60 - 90 minutes collecting logs, isolating the issue, and coordinating with engineers. This time, the alert was handled by Traversal's AI SRE, streamlining the response without any manual triage: * 2:08 PM - Traversal began investigating on behalf of the customer. * 2:10 PM - It identified a set of abusive IPs overwhelming the site and outlined the root cause. * 2:12 PM - It proposed a self-healing action: block the malicious IPs and restart affected services, with UI-guided steps and a full remediation summary. * 2:13 PM - With a single click, the issue was resolved - end to end, in under 5 minutes. What would've taken hours was handled autonomously by Traversal, enabling Cloudways to respond to customer issues faster and more reliably - without manual triage or escalation.

Traversal
Oct 7th, 2025
Eventbrite Turns to Traversal's AI SRE to Overcome Complexity of Legacy Systems

To address these issues, Eventbrite partnered with Traversal to cut through this complexity and provide clearer visibility into their complex infrastructure, towards the goal of automating their incident response.