Facebook pixel

Staff Site Reliability Engineer
Confirmed live in the last 24 hours
Remote in USA
Experience Level
Desired Skills
  • Lead the administration of tools like DataDog, Sentry, and PagerDuty
  • Identify strategies to improve our full-stack telemetry and monitoring capabilities
  • Mentor other SREs who contribute to observability-related work
  • Help drive organizational maturity by evolving and improving reliability and software engineering best practices
  • Combination of experience in both software engineering and operations
  • 7+ years working in a relevant role, including 3+ years of technical leadership experience mentoring junior engineers
  • 3+ years of experience architecting and administrating observability stacks, either managed or self-hosted (e.g. DataDog, New Relic, Prometheus, Elastic Stack/ELK)
  • Operation of containerized microservices running on public cloud, asynchronous event processing, and databases
  • Strong command ofLinux, Git and CI/CD pipelines
  • Design and build new tools to automate repetitive tasks, prevent incidents or improve TTR using an object oriented programming language such as Python
  • Infrastructure as Code using tools like Terraform, Terragrunt, Ansible or CloudFormation
  • Work with the SRE manager and other engineering managers to define SLOs to help drive SLA compliance
  • Act as the resident technical expert for our team to share knowledge, experience, and expertise, focusing on the more senior members when possible
  • Understand how application components interact, and contribute to architectural discussions
  • Unwavering commitment to operational security and best practices
  • Ownership: identify problems but also propose solutions, then go out and implement them--from submitting a merge request on another team's repository to scoping out a new reliability project
  • Connection: motivated to help other teams improve their service reliability through reviews, pair programming, hands-on training and continuous improvement of tooling and services
  • Experience with and interest in chaos engineering (Gremlin, Litmus, Chaos Mesh) is a nice to have but not required
  • On-call support of highly available production systems
  • Expand and improve our observability and monitoring footprint
  • Collaborate with the engineering manager, product managers, other SREs, and cloud infrastructure engineers to create architectural plans, define project requirements, and establish technical standards
  • Connect with non-engineering business units across the organization to better our understanding of the needs and requirements of reliability and the incident management process for Checkr and our customers
  • Pair program with team members, review merge requests, help engineers get unblocked, and provide peer mentoring
  • Improve common operational challenges by building tools and automating scripts
  • Automate observability and alerting across an ever-changing landscape of microservices
  • Automate Service Reliability Scorecards and Production Readiness Standards
  • Software engineering project work, proposed and driven by individual SRE team members, to remove operational bottlenecks and increase velocity in ways we've never considered before
  • Serve as the on-call incident commander to help debug and drive resolution of reliability issues, contribute to the postmortem, and work to prevent recurrence
  • Participate in design and production reviews for new features, products, and infrastructure
  • Audit and tune the configuration of systems owned by other engineering teams
  • Assist in planning for the growth of Checkr's infrastructure and infrastructure reliability/resiliency

501-1,000 employees

Automating professional background checks
Company Overview
Checkr powers people infrastructure for the future of work. With artificial intelligence and machine learning, Checkr's solutions make background checks faster—building a fairer future by designing technology to create opportunities for all.
  • A fast-paced and collaborative environment
  • Learning and development allowance
  • Competitive compensation and opportunity for advancement
  • 100% medical, dental, and vision coverage
  • Up to 25K reimbursement for fertility, adoption, and parental planning services
  • Flexible PTO policy
  • Monthly wellness stipend, home office stipend
Company Core Values
  • Humility: We are respectful and free from arrogance. We put the success of our employees over our company and are excited to learn from each other.
  • Transparency: We trust each other to communicate the good and the bad as it relates to doing our best work. We aren’t afraid to voice our opinions and are receptive to feedback.
  • Grit: We are passionate and hustle to raise the bar. We persevere through our challenges and grow from our failures.
  • Ownership: We strive for thoughtful impact, take pride in our work, and hold ourselves accountable. We step up and take on new challenges to help further the success of the company.
  • Connection: We genuinely care about each other and understand that our people are our power. We celebrate our lived experiences and enjoy helping and supporting each other.