Site Reliability Engineer
Posted on 9/15/2023

5,001-10,000 employees

Customer engagement platform & developer of communications APIs
Company Overview
Twilio's mission is to fuel the future of communications. By making communications a part of every software developer's toolkit, Twilio is enabling innovators across every industry to reinvent how companies engage with their customers.
Remote in USA
Experience Level
Desired Skills
DevOps & Infrastructure
Software Engineering
  • 5+ years experience writing production-grade code in a modern programming language
  • Proven experience in designing, implementing, and maintaining observability solutions, preferably within a cloud-based SaaS environment
  • Strong proficiency in programming languages such as Python, Go, or Java
  • Familiarity with open-source observability tools and standards, including Prometheus, Grafana, OpenTelemetry, and others
  • Knowledge of distributed tracing, log management, and metric aggregation techniques
  • Proficiency in IaC, Kubernetes, and AWS concepts, best practices, and tools
  • Participate in team on-call rotations
  • Solid problem-solving skills, proactive attitude, and ability to work collaboratively in a dynamic team environment
  • Design, implement, and maintain observability infrastructure and tooling, focusing on logging, tracing, metrics, and continuous profiling
  • Collaborate with software engineers to provide comprehensive instrumentation to capture relevant telemetry data for observability purposes
  • Leverage open-source standards, such as OpenTelemetry, to build scalable and interoperable solutions
  • Develop data pipelines to handle high cardinality data and enable interactive troubleshooting capabilities for engineers
  • Enable effective telemetry correlation and allow engineers to understand the behavior of distributed systems
  • Work on building affordable and engineer-friendly observability tooling, facilitating real-time root-cause analysis and reducing mean time to resolution (MTTR) for incidents
  • Contribute to the development of the Observability platform's features and functionalities, continuously enhancing the user experience and ensuring self-service capabilities for other teams
  • Collaborate with the OpenTelemetry community and contribute to open-source initiatives to foster a broader adoption of observability solutions
Desired Qualifications
  • Experience with context propagation and telemetry correlation to enable effective troubleshooting and monitoring of distributed systems
  • Experience in building data pipelines
  • Understanding of high cardinality data challenges and strategies for handling complex telemetry data
  • Proficiency in optimizing cloud infrastructure and compute costs through the implementation of cost observability software and workflows