Infrastructure Engineer
Posted on 1/4/2023
INACTIVE
Ridesharing app
Company Overview
Lyft's mission is to improve people's lives with the world's best transportation. The company operates a mobile platform for the ridesharing of cars, bikes, and scooters and serves over a million rides per day.
Locations
Ontario, Canada
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
Google Cloud Platform
Java
Microsoft Azure
Terraform
Kubernetes
Python
CategoriesNew
DevOps & Infrastructure
Software Engineering
Requirements
- Experience designing, implementing and operating large-scale customer-facing SaaS infrastructure
- Experience with high level programming languages (Python, Go, Java, etc.) and declarative languages (eg. Terraform)
- Experience working with public cloud platforms (eg. AWS, Google Cloud Platform, Microsoft Azure, etc.) and container orchestrators (eg. Kubernetes)
- Strong troubleshooting and debugging skills
- Strong Cross team collaboration
- Good communication skills
- Must be fluent in spoken and written English and minimally be willing to learn French if required
Responsibilities
- Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform
- Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems
- Help define infrastructure roadmap and architecture based on technology and business needs
- Build holistic visibility into SLIs, SLOs, SLAs, dependency graphs, past performance of software, network, and system to ensure that we can continue to scale without increasing operational burden or toil
- Share knowledge by giving brown bags, tech talks, and evangelizing appropriate tech and engineering best practices
- Step back to observe patterns and develop innovative tools and automation to minimize toil. Use those learnings to drive the best operational practices
- Partner with the broader Lyft organization to build a culture of rigorously learning from incidents
- Share on-call responsibilities with other teammates and own the improvement of the team's on-call practices
- Unblock, support, and effectively communicate across teams to achieve results
- Assist in cost engineering efforts to ensure efficient use of cloud resources