Senior Site Reliability Engineer @ Zipline

About Zipline

Do you want to change the world? Zipline is on a mission to transform the way goods move. Our aim is to solve the world’s most urgent and complex access challenges by building, manufacturing and operating the first instant delivery and logistics system that serves all humans equally, wherever they are. From powering Rwanda’s national blood delivery network and Ghana’s COVID-19 vaccine distribution, to providing on-demand home delivery for Walmart, to enabling healthcare providers to bring care directly to U.S. homes, we are transforming the way things move for businesses, governments and consumers.

The technology is complex but the idea is simple: a teleportation service that delivers what you need, when you need it. Through our technology that includes robotics and autonomy, we are decarbonizing delivery, decreasing road congestion, and reducing fossil fuel consumption and air pollution, while providing equitable access to billions of people and building a more resilient global supply chain.

Join Zipline and help us to make good on our promise to build an equitable and more resilient global supply chain for billions of people.

What You’ll Do

As a Senior Site Reliability Engineer, you will play a crucial role in maintaining and improving our systems and infrastructure. You will work closely with cross-functional teams, including both software, hardware, operations teams, to design, implement, and optimize our systems for high availability, fault tolerance, and scalability. You will also be responsible for proactively identifying potential issues and bottlenecks, driving incident response and post-incident analysis, and implementing automation and monitoring solutions.

Responsibilities:

Design, develop, and maintain highly reliable and scalable systems and infrastructure.
Collaborate with software engineering and DevOps teams to ensure the smooth integration and deployment of applications and services.
Implement and improve monitoring, alerting, and observability solutions to proactively identify and resolve potential issues.
Automate infrastructure provisioning, configuration management, and deployment processes using modern tools and technologies.
Conduct system performance analysis and optimization to ensure efficient resource utilization and optimal response times.
Participate in incident response and resolution, conducting post-incident analysis and implementing preventive measures.
Continuously evaluate and implement best practices and industry standards in site reliability engineering.
Mentor and provide technical guidance to junior members of the team, fostering a culture of continuous learning and improvement.
Collaborate with cross-functional teams to define and refine system requirements, capacity planning, and disaster recovery strategies.
Work with the on-site operations team to ensure proper management, maintenance, and scaling of hardware components.
Coordinate and perform occasional international travel to support deployments, infrastructure setup, and collaborate with remote teams.

What You’ll Bring

9+ years of experience in a similar role, with a proven track record of designing, implementing, and managing highly available and scalable systems.
Deep understanding of Linux/Unix systems administration and experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes).
Proficiency in at least one programming language (e.g., Python, Go, Java) and experience with infrastructure-as-code tools (e.g., Terraform, Ansible, AWS CDK).
Strong knowledge of networking principles, including TCP/IP, DNS, load balancing, and firewalls.
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack) and incident management systems.
Solid understanding of distributed systems, microservices architecture, and cloud-native application development.
Experience dealing with on-site operations teams, such as coordinating with the team for proper management, maintenance, and scaling of hardware components.
Strong troubleshooting and problem-solving skills, with the ability to analyze complex systems and identify performance bottlenecks.
Excellent communication skills, both written and verbal, with the ability to effectively collaborate with cross-functional teams.
Willingness to travel internationally occasionally to support deployments and collaborate with remote teams.
Experience with both production and internal tooling environments is desired.
Relevant certifications (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator) are a plus. Experience working with SSO Providers (e.g., Okta, ADFS, etc) a plus

What Else You Need to Know

The starting cash range for this role is $160,000 - $200,000. Please note that this is a target, starting cash range for a candidate who meets the minimum qualifications for this role. The final cash pay for this role will depend on a variety of factors, including a specific candidate’s experience, qualifications, skills, working location, and projected impact. The total compensation package for this role may also include: equity compensation; discretionary annual or performance bonuses; sales incentives; benefits such as medical, dental and vision insurance; paid time off; and more.

Zipline is an equal opportunity employer and prohibits discrimination and harassment of any type without regard to race, color, ancestry, national origin, religion or religious creed, mental or physical disability, medical condition, genetic information, sex (including pregnancy, childbirth, and related medical conditions), sexual orientation, gender identity, gender expression, age, marital status, military or veteran status, citizenship, or other characteristics protected by state, federal or local law or our other policies.

We value diversity at Zipline and welcome applications from those who are traditionally underrepresented in tech. If you like the sound of this position but are not sure if you are the perfect fit, please apply!

Compensation Overview

About Zipline

What You’ll Do

Responsibilities:

What You’ll Bring

What Else You Need to Know

6 month growth

1 year growth

2 year growth