Facebook pixel

Senior Site Reliability Engineer, Observability - remote possible
Posted on 3/25/2022
INACTIVE
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
Bash
Docker
Google Cloud Platform
Kafka
Git
Java
Linux/Unix
Management
Microsoft Azure
Puppet
Terraform
Kubernetes
Python
Ansible
Requirements
  • Coding experience in one or more of Python, Bash, Go or Java
  • Infrastructure as code experience with in one or more of Terraform, Ansible, Puppet or Salt
  • Strong experience with modern application development workflows and version control systems like GitHub, Gitlab or Bitbucket
  • Strong working knowledge of Docker containers and cloud platforms (AWS, GCP and/or Azure)
  • Strong working knowledge of orchestration engines and package management including Kubernetes, Helm, and Istio
  • Experience operating one or more OSS technologies like Kafka, Cassandra, Zookeeper; other backends and streaming systems a plus
  • Extensive understanding of Unix/Linux systems from kernel to shell and beyond (system libraries, file systems, and client-server protocols)
  • 5+ years of experience as a Site Reliability Engineer, Production Engineer or Backend Software Engineer for web-scale or similar platforms
  • BS degrees in Computer Science or related technical field, or equivalent practical experience
  • Coding experience in one or more of Python, Bash, Go or Java
  • Infrastructure as code experience with in one or more of Terraform, Ansible, Puppet or Salt
  • Strong experience with modern application development workflows and version control systems like GitHub, Gitlab or Bitbucket
  • Strong working knowledge of Docker containers and cloud platforms (AWS, GCP and/or Azure)
  • Strong working knowledge of orchestration engines and package management including Kubernetes, Helm, and Istio
  • Experience operating one or more OSS technologies like Kafka, Cassandra, Zookeeper; other backends and streaming systems a plus
  • Extensive understanding of Unix/Linux systems from kernel to shell and beyond (system libraries, file systems, and client-server protocols)
  • 5+ years of experience as a Site Reliability Engineer, Production Engineer or Backend Software Engineer for web-scale or similar platforms
  • BS degrees in Computer Science or related technical field, or equivalent practical experience
Responsibilities
  • Responsible for automating & operationalizing cloud provider infrastructure via Terraform as well as Kubernetes, Helm and Istio
  • Monitor capacity & utilization and work closely with the infrastructure team to orchestrate scale-up/down of backend services
  • Own & operate critical back-end open-source services like Cassandra, Kafka, and Zookeeper
  • Build tools and design processes that help improve observability and system resiliency
  • Triage site availability incidents and proactively work towards reducing MTTR for customer-impacting incidents
  • Partner with service owners to implement service level metrics & service level objectives that act as service-level health indicators
  • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services
  • Responsible for automating & operationalizing cloud provider infrastructure via Terraform as well as Kubernetes, Helm and Istio
  • Monitor capacity & utilization and work closely with the infrastructure team to orchestrate scale-up/down of backend services
  • Own & operate critical back-end open-source services like Cassandra, Kafka, and Zookeeper
  • Build tools and design processes that help improve observability and system resiliency
  • Triage site availability incidents and proactively work towards reducing MTTR for customer-impacting incidents
  • Partner with service owners to implement service level metrics & service level objectives that act as service-level health indicators
  • Establish design patterns for monitoring, benchmarking and deploying new features for the backend services
Splunk

5,001-10,000 employees

Data management & visualization platform
Company Overview
Splunk's mission is to address the challenges and opportunities of managing massive streams of machine-generated big data. Splunk is the leading software platform for machine data that enables customers to gain real-time Operational Intelligence.
Benefits
  • Medical, dental and vision insurance plans for regular, full-time U.S. employees — choose the best plans for you and your family. Plus: Health Savings Account (HSA), Life insurance and survivor benefits, Flexible Spending Accounts (FSA), Business travel and accident insurance, Voluntary Critical Illness & Hospital Indemnity
  • Eligible employees enjoy: 401(k) Plan with a company match, Employee Stock Purchase Plan (ESPP), Equity awards, Bonus or commission program
  • We support you and your family: Paid parental leave, Mother rooms and wellness rooms, Family Planning
  • Your work/life balance is important to us, that's why we offer: 16 company holidays, 15 vacation days, 10 sick days, 10 bereavement days, 5 volunteer days
  • Ensuring our employees' success goes beyond insurance plans: Education reimbursement, Electric car charging stations, Employee Assistance Program (EAP), Stocked kitchens, Gym discounts/onsite fitness centers, Pet insurance discount, Student loan resources, Cool workspace with collaborative environments, 529 College Savings Plan
Company Values
  • Innovative: We’re passionate about our customer success. We keep our energy laser-focused on giving our customers the best possible and most trustworthy experience, driven ultimately by integrity. After all, we’d be nothing without them.
  • Open: We never stop learning or striving to create a positive impact. The work we do matters. We innovate at warp speed to disrupt the world's notion of what’s possible.
  • Disruptive: We are humble and value openness and honesty. We speak our truths mindfully and in consideration of others. Candor is cool - respect is required.
  • Fun: We embrace the ride (preferably in a Splunk t-shirt). We take our work seriously, but not ourselves. We weave an irreverent and infectious sense of fun into everything we do.
  • Passionate: We cultivate an inclusive environment where all backgrounds are celebrated. Striving for equity and embracing our individual uniqueness is our secret sauce. And it will only make us stronger.
  • #WeAreSplunk: We represent many functions and regions but are one team. We value each other's efforts and moonshot ideas. And we celebrate highs, and learn from lows, together. We trust and rely on each other. Remember: There’s no “I” in Splunk.