Facebook pixel

Senior Site Reliability Engineer
Updated on 9/22/2022
San Jose, CA, USA
Experience Level
Desired Skills
Development Operations (DevOps)
Google Cloud Platform
Microsoft Azure
  • 5+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering
  • BS/MS/PhD in Computer Science or related field
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines
  • Experience with monitoring/alerting (Prometheus, Thanos, Victoria Metrics, Grafana, vmrules)
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
  • You have a great ability to debug and optimize code and automate routine tasks
  • You have a solid background in software development and architecting resilient and reliable applications
  • You are a good communicator and comfortable working with other engineers across the organization
  • Evangelize and advocate for reliability practices across our organization
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and production readiness reviews
  • Ability to debug and optimize code and automate routine tasks: reduce toil
  • Analyze and optimize our core product by developing and implementing reliability and performance practices
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity
  • Be on-call for production services
  • Practice sustainable incident response and blameless retrospectives
Desired Qualifications
  • Experience being on-call for an internet facing production system
  • Expertise in k8s, helm, yaml, GitOps, ArgoCD, Distributed Tracing (Lightstep, Honeycomb, OpenTelemetry), k8s resource management (e.g. kubecost)
Dremio Corporation

201-500 employees

Data lake engine
Company mission
Dremio’s leading the way to reimagine your data architecture. Removing barriers, accelerating time to insight, putting control in the hands of the user.
  • Health, Dental, and Vision Insurance
  • 401(k)
  • Stock Options
  • Work From Home
  • Office Events
  • Parental Leave Benefits
  • Paid Time Off
Company Values
  • Communicate with clarity.
  • Drive accountability.
  • Be respectful.
  • Confront brutal facts.
  • Focus on results.
  • Operate with urgency.
  • Build a flywheel.