Full-Time

Senior Site Reliability Engineer

Zipline

Zipline

1,001-5,000 employees

Automated delivery system for equitable global logistics

Robotics & Automation
Aerospace

$160000 - $200000

Equity, Annual Bonuses, Sales Incentives

Senior, Expert

San Bruno, CA, USA

Required Skills
TCP/IP
Kubernetes
Microsoft Azure
Python
Communications
Management
Java
Docker
AWS
Terraform
Ansible
Development Operations (DevOps)
Linux/Unix
Google Cloud Platform
Requirements
  • 9+ years of experience in a similar role
  • Deep understanding of Linux/Unix systems administration and experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes)
  • Proficiency in at least one programming language (e.g., Python, Go, Java) and experience with infrastructure-as-code tools (e.g., Terraform, Ansible, AWS CDK)
  • Strong knowledge of networking principles, including TCP/IP, DNS, load balancing, and firewalls
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack) and incident management systems
  • Solid understanding of distributed systems, microservices architecture, and cloud-native application development
  • Experience dealing with on-site operations teams, such as coordinating with the team for proper management, maintenance, and scaling of hardware components
  • Strong troubleshooting and problem-solving skills, with the ability to analyze complex systems and identify performance bottlenecks
  • Excellent communication skills, both written and verbal, with the ability to effectively collaborate with cross-functional teams
  • Willingness to travel internationally occasionally to support deployments and collaborate with remote teams
  • Experience with both production and internal tooling environments is desired
  • Relevant certifications (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator) are a plus
  • Experience working with SSO Providers (e.g., Okta, ADFS, etc) a plus
Responsibilities
  • Design, develop, and maintain highly reliable and scalable systems and infrastructure
  • Collaborate with software engineering and DevOps teams to ensure the smooth integration and deployment of applications and services
  • Implement and improve monitoring, alerting, and observability solutions to proactively identify and resolve potential issues
  • Automate infrastructure provisioning, configuration management, and deployment processes using modern tools and technologies
  • Conduct system performance analysis and optimization to ensure efficient resource utilization and optimal response times
  • Participate in incident response and resolution, conducting post-incident analysis and implementing preventive measures
  • Continuously evaluate and implement best practices and industry standards in site reliability engineering
  • Mentor and provide technical guidance to junior members of the team, fostering a culture of continuous learning and improvement
  • Collaborate with cross-functional teams to define and refine system requirements, capacity planning, and disaster recovery strategies
  • Work with the on-site operations team to ensure proper management, maintenance, and scaling of hardware components
  • Coordinate and perform occasional international travel to support deployments, infrastructure setup, and collaborate with remote teams

Company Stage

Series F

Total Funding

$825.5M

Headquarters

San Francisco, California

Founded

2014

Growth & Insights
Headcount

6 month growth

4%

1 year growth

29%

2 year growth

82%