Simplify Logo

Full-Time

Staff Site Reliability Engineer

Posted on 6/18/2024

Platform Science

Platform Science

201-500 employees

Configurable open platform for fleet management

Automotive & Transportation
Hardware

Compensation Overview

$145.3k - $227.7kAnnually

+ Bonus + Equity

Expert

San Diego, CA, USA

Category
DevOps & Infrastructure
Site Reliability Engineering
IT & Security
Required Skills
Datadog
Bash
Kubernetes
Python
Docker
AWS
Jenkins
Requirements
  • 9+ years of hands-on experience in SRE or Platform Engineering roles
  • 4+ years of expertise with automation technologies like Jenkins, ArgoCD, or similar
  • 3+ years of experience with Kubernetes, Helm, and Docker within production environments
  • Proficiency with current software development lifecycle (SDLC) concepts and best practices, CI/CD pipelines, and test-driven development
  • Experience with AWS, encompassing proficiency in EKS, IAM, autoscaling, networking, and load balancing/request routing in a production environment
  • Proficiency in Python, Bash, Nodejs, and/or Go
  • Proficiency with distributed tracing methodologies and observability tools such as Prometheus, ELK, or Datadog
  • Strong emphasis on documentation and fostering knowledge-sharing practices within the team and organization
  • Track record of successfully training and mentoring engineers
  • Expertise in optimizing performance and managing costs within cloud environments
  • Sound understanding of SLI/SLO concepts and adherence to SRE best practices
Responsibilities
  • Lead the development and enhancement of Continuous Integration/Continuous Deployment (CI/CD) pipelines, along with refining release management processes and associated toolsets
  • Architect and maintain Helm charts to streamline application deployment and management
  • Establish standardized observability solutions to empower development teams in efficiently managing their applications
  • Lead the effort in promoting and prioritizing reliability, driving achievement of uptime goals and mentoring colleagues in SRE best practices
  • Conduct comprehensive Production Readiness Reviews, working with teams to identify and establish Service Level Objectives (SLOs), and ensure high-quality and dependable services
  • Design and develop software solutions to address operational challenges effectively to improve system stability and reliability
  • Fulfill on-call duties, providing expert support to development teams for mission-critical applications in production environments
  • Improve the resiliency of applications and systems using chaos engineering

Platform Science offers a configurable open platform for fleet management, providing modern telematics, enterprise-grade applications, and future-proof solutions. The company empowers fleets to develop, deploy, and manage their commercial vehicles' mobile devices and applications on a single platform, fostering innovation and growth through partnerships with industry leaders.

Company Stage

Series C

Total Funding

$310.7M

Headquarters

San Diego, California

Founded

2015

Growth & Insights
Headcount

6 month growth

-1%

1 year growth

-16%

2 year growth

4%
INACTIVE