Full-Time

Director of Site Reliability Engineering

Nadp

Confirmed live in the last 24 hours

Thousand Eyes

Thousand Eyes

501-1,000 employees

Network performance monitoring and analytics platform

Data & Analytics
Enterprise Software

Senior, Expert

London, UK

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
Airflow
Data Science
Apache Spark
Requirements
  • You have a deep understanding of the distributed systems design, cloud technology and their components, dependencies, and code that define infrastructure
  • You possess a deep understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts
  • Extensive hands-on experience building cloud, big data and/or ML/AI infrastructure (e.g. EMR, Airflow, Comet ML, AWS SageMaker, Spark, etc)
  • Extensive hands-on experience operating mission-critical services in production environments which are required to have high availability and reliability.
  • Proven ability to think strategically and align technical initiatives with business objectives
  • Can provide a strong technical vision for your teams and ensure consistent delivery of objectives
  • Have experience formulating a team's technical strategy and roadmap; you've collaborated and partnered effectively with several other teams to execute shared goals
  • Understand how to balance tactical needs with strategic growth and quality-based initiatives that can span multiple quarters
  • Proven site reliability engineering management experience leading multiple teams
Responsibilities
  • Lead and inspire a talented team of site reliability engineers, fostering a culture of innovation, collaboration, and excellence in development and operation of infrastructure platforms
  • Drive the strategic vision for the development, implementation, and management of cloud, data, ML/AI platforms.
  • Collaborate closely with cross-functional teams, including development, product management, and security to define and implement reliable, secure, and scalable infrastructure platforms
  • Provide oversight and direction in the development and operation of cloud platforms, ensuring high-quality, scalable, and reliable solutions that meet customer needs
  • Drive operational excellence in operations and security processes
  • Mentor and develop engineering talent, fostering a culture of continuous learning and professional growth within the site reliability engineering group

ThousandEyes specializes in monitoring network infrastructure and analyzing internet performance. Its platform operates in the cloud, providing businesses with tools to understand and enhance their digital experiences. By offering visibility into the performance of networks and applications, ThousandEyes enables companies to monitor their digital environments, identify problems, and improve service reliability. The platform maps the global structure of wide-area networks and tracks performance metrics, ensuring that online services function effectively. Unlike many competitors, ThousandEyes focuses on a subscription-based model, allowing clients to access real-time monitoring, outage detection, and performance analytics tailored to their needs. The goal of ThousandEyes is to empower businesses across various sectors, such as finance, healthcare, and retail, to maintain optimal digital performance and thrive in an interconnected world.

Company Stage

Acquired

Total Funding

$107.5M

Headquarters

San Francisco, California

Founded

2010

Growth & Insights
Headcount

6 month growth

4%

1 year growth

8%

2 year growth

39%
Simplify Jobs

Simplify's Take

What believers are saying

  • Being part of Cisco enhances ThousandEyes' market reach and credibility, providing employees with stability and growth opportunities.
  • The continuous innovation, such as the launch of Custom Webhooks and WAN Insights, reflects a dynamic work environment focused on cutting-edge technology.
  • Recognition as a strong performer in The Forrester Wave™ for End-User Experience Management highlights the company's industry leadership and potential for career advancement.

What critics are saying

  • The competitive landscape, including rivals like SolarWinds and Splunk, requires ThousandEyes to continuously innovate to maintain its edge.
  • Integration challenges with Cisco's broader product suite could lead to operational complexities and potential disruptions.

What makes Thousand Eyes unique

  • ThousandEyes' integration with Cisco's extensive networking ecosystem provides a unique advantage in offering comprehensive network visibility and performance analytics.
  • The platform's AI-powered capabilities, such as Digital Experience Assurance (DXA), set it apart by proactively predicting and resolving internet outages.
  • ThousandEyes' focus on real-time monitoring and outage detection ensures that clients can maintain optimal digital performance, a critical need in today's connected world.

Help us improve and share your feedback! Did you find this helpful?