Simplify Logo

Full-Time

Senior Site Reliability Engineer

Observability

Updated on 9/6/2024

Thousand Eyes

Thousand Eyes

501-1,000 employees

Network performance monitoring and analytics platform

Data & Analytics
Consulting
Enterprise Software

Senior, Expert

London, UK

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
Sentry
Kubernetes
Python
AWS
Terraform
Google Cloud Platform
Requirements
  • Strong Infrastructure as Code skills, ideally with Terraform and Kubernetes.
  • Strong knowledge of modern logging tool sets, including Logstash or Fluentd.
  • Understanding of Prometheus and it’s ecosystem, including Alertmanager.
  • Good knowledge of Application Performance Monitoring tools and crash reporting tools, such as Sentry.
  • Good knowledge of cloud provider managed services, and how they can be leveraged in our context.
  • Ability to write high quality code in Python, Go, or equivalent languages.
Responsibilities
  • design and implement strategies that enhance visibility.
  • designing, deploying, and maintaining cloud-native monitoring services that are both elastic and resilient to failure across AWS and GCP.
  • establish standards and best practices for the instrumentation of container-based services and cloud-managed services.
  • maintenance of our alerting pipeline is key to ensuring that notifications are timely, accurate, and directed to the appropriate channels.
  • Automation is a priority, as it allows our monitoring platforms to scale effortlessly, promoting a self-service approach.
  • active participation and contribution to the improvement of our 24x7 incident response and on-call rotation are vital to the robustness of our operational response.

ThousandEyes specializes in monitoring network infrastructure and analyzing internet performance. Its platform operates in the cloud, providing businesses with tools to understand and enhance their digital experiences. By offering visibility into the performance of networks and applications, ThousandEyes enables companies to identify issues and improve the reliability of their online services. The platform maps the global topology of wide-area networks and measures performance metrics, ensuring that clients' services run smoothly. Unlike many competitors, ThousandEyes focuses on a subscription-based model, allowing clients to access real-time monitoring, outage detection, and detailed performance analytics tailored to their needs. The goal of ThousandEyes is to empower businesses across various sectors, such as finance, healthcare, and retail, to maintain optimal digital performance and thrive in a connected environment.

Company Stage

M&A

Total Funding

$113M

Headquarters

San Francisco, California

Founded

2010

Growth & Insights
Headcount

6 month growth

4%

1 year growth

10%

2 year growth

47%
Simplify Jobs

Simplify's Take

What believers are saying

  • Being part of Cisco enhances ThousandEyes' market reach and credibility, providing employees with stability and growth opportunities.
  • The continuous innovation, such as the launch of Custom Webhooks and WAN Insights, reflects a dynamic work environment focused on cutting-edge technology.
  • Recognition as a strong performer in The Forrester Wave™ for End-User Experience Management highlights the company's industry leadership and potential for career advancement.

What critics are saying

  • The competitive landscape, including rivals like SolarWinds and Splunk, requires ThousandEyes to continuously innovate to maintain its edge.
  • Integration challenges with Cisco's broader product suite could lead to operational complexities and potential disruptions.

What makes Thousand Eyes unique

  • ThousandEyes' integration with Cisco's extensive networking ecosystem provides a unique advantage in offering comprehensive network visibility and performance analytics.
  • The platform's AI-powered capabilities, such as Digital Experience Assurance (DXA), set it apart by proactively predicting and resolving internet outages.
  • ThousandEyes' focus on real-time monitoring and outage detection ensures that clients can maintain optimal digital performance, a critical need in today's connected world.