Facebook pixel

Senior Database Reliability Engineer
Posted on 2/22/2023
INACTIVE
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
Google Cloud Platform
Operating Systems
Postgres
Puppet
SQL
Terraform
Ansible
Chef
Datadog
Requirements
  • At least 5 years of experience running PostgreSQL in large production environments
  • Experience operating systems in cloud environments such as AWS or GCP
  • Have deep knowledge of SQL and data modeling for RDBMS
  • Have good knowledge of the internals of PostgreSQL
  • Experience deploying/utilizing proxy and optimization solutions such as RDS Proxy pgBouncer, PGAnalyze, OtterTune, etc
  • Have several years of experience programming in a software engineering role
  • Understanding of SRE concepts such as SLA/SLI/SLOs and incident management processes
  • Strong desire to automate away the toil
  • Strong interest in collaborating with and mentoring product engineers about SQL and database topics
  • Experience with infrastructure automation and configuration management using tools like Terraform, Chef, Ansible, Puppet, etc
  • Experience with observability tooling for database monitoring and troubleshooting, such as Datadog, Percona, EverSQL, etc
  • Familiarity with distributed systems and networking concepts as they apply to applications and database utilization
Responsibilities
  • Work on database reliability and performance as a member of the SRE team
  • Analyze solutions and implement best practices for operating our PostgreSQL databases
  • Work on observability of relevant database metrics and make sure we reach our database objectives
  • Work with other reliability engineers to roll out changes to our production environment and help mitigate database-related production incidents
  • Participate in on-call support rotation with the team
  • Provide database expertise to engineering teams (for example through reviews of database migrations, queries and performance optimizations)
  • Work on automation of database infrastructure and help engineering succeed by providing self-service tools
  • Plan the growth and manage the capacity of Lattice's database infrastructure
  • Support and debug database production issues across services and levels of the stack
  • Document every action so your learnings turn into repeatable actions and then into automation
  • Cross-train other reliability engineers on aspects of database reliability
Lattice

501-1,000 employees

People success platform