Facebook pixel

Senior Site Reliability Engineer
Posted on 7/1/2022
Remote • Ottawa, ON, Canada
Experience Level
Desired Skills
  • A working knowledge of metrics, logs, and distributed tracing practices
  • Depth of knowledge in at least one of those practices
  • Comfortable contributing to a shared codebase
  • Understand Kubernetes and the container orchestration concepts it uses
  • Passionate about process automation and familiar with enough different approaches to entertain several before deciding on which to pursue
  • A healthy amount of curiosity for containerized technology and how it works
  • Experience identifying changes that improve processes from a reliability and performance perspective
  • Enjoy finding solutions in low information situations
  • Comfortable using telemetry data to spot parts of a system that do not scale, research solutions, and implement a migration plan that mitigates the situation
  • Enjoy working to determine what service information is important enough to drive service levels and create the means for them to use that data
  • Have a curiosity for current and new practices that lead to collaboration and process change
  • Enjoy documenting and sharing solutions to interesting challenges with others
  • Participated in post-mortems and have definite opinions on how they serve the organization
  • Experience working as a team to support a critical core system
  • Contribute to our team's Telemetry Platform that consists of Prometheus, Cortex, Loki, Tempo, and Grafana deployed in EKS using Terraform and Weave Flux on AWS
  • Contribute to projects across the organization to address challenges that your skill set exceeds
  • Work with our dev teams to determine how to make their paging strategy more meaningful and less problematic
  • Develop ways to aid our development teams in instrumenting their services to collect important information about our applications that allows for investigation
  • Working to reduce the level of effort needed to utilize the instrumentation that the teams are creating
  • Provide valuable feedback and collaborate with the teams whose products we use as we iterate on our own infrastructure
  • Determine what information is important enough to drive service levels for our services
  • Use service level information to determine reliability on our Telemetry Platform
  • Participate in an on-call rotation that responds to incidents concerning the Telemetry Platform
  • Contribute to solutions defined in GitLab projects and GitHub repositories
  • Maintain AWS EKS clusters using our Terraform modules
  • Automate complex business challenges that require your specific skill set
  • Contribute to core infrastructure pieces that allow Angi to scale to meet the needs of its clients
  • Use the Telemetry Platform to assist in investigations that happen across the organization
  • Plan and shape the growth of Angi's infrastructure as we iterate it over time
  • Think about systems - edge cases, failure modes, behaviors, specific implementations
  • Have an understanding of large scale system design, monitoring, observability, and operational practices
  • Have strong programming skills - Go, Python, and/or Ruby
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
  • Have experience with Weave Flux, Nginx, Kubernetes, Terraform, Prometheus, Loki, Cortex, Tempo, or similar technologies
  • Are compelled to keep a constant eye on the Observability space, identifying and planning ahead based on changes in practices/technologies as they arise

1,001-5,000 employees

Comprehensive solution for home needs
Company Overview
At Angi, they invest their resources into growing their business and their people. Angi's mission is to help the best consumers find the best service providers and promote happy transactions remains the same.
  • Competitive compensation.
  • This position will be eligible for a competitive year end performance bonus & equity package.
  • Full medical, dental, vision package to fit your needs
  • Flexible vacation policy: work hard and take time when you need it
  • Pet discount plans & retirement plan with company match (401K)
  • The rare opportunity to work with sharp, motivated teammates solving some of the most unique challenges and changing the world
Company Values
  • Start with the customer
  • All about talent
  • Strength in diversity
  • Create & build momentum
  • Be an owner
  • Disagree as individuals, deliver as a team
  • Drive growth
  • Better today, perfect tomorrow
  • Do more with less
  • Deliver results
  • Data beats opinion
  • Enjoy the journey