We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique perspective and experiences, so we can collectively revolutionize travel and together find the good out there.
The Site Operations team at Tripadvisor is responsible for maintaining and enhancing the core systems that power and support the tripadvisor.com website. This includes systems in both private data centers and over a hundred accounts in AWS. Our scope of responsibilities is vast and would take an entire page to list here. Suffice it to say that we are the go-to team for questions about the interface boundaries that lie between these two halves of the company, as well as the deep inner workings of the legacy half. Data at Tripadvisor is hugely important, and as a result, we have over 600 on-premise logical databases running on over 100 database hosts serving petabytes of data. As a Lead DBA/Principal Software Engineer on the SiteOps team, you will be a force multiplier for our engineering & operations teams, delivering tooling & infrastructure that not only has a direct impact on day-to-day operations but also helps contribute to the future evolution of Infrastructure & Engineering here at Tripadvisor. You’ll be part of a dynamic team responsible for ensuring the high availability, reliability, and scalability of our data maintenance and delivery.
We are looking for passionate engineers with deep experience in Postgres, as well as AWS DMS, RDS, and Aurora, to help us optimize and automate our infrastructure and deployment processes around our databases. We are currently involved in several types of systems migrations, within both the scope of on-prem to AWS/cloud-native migrations, as well as on-prem data centers to alternate AWS-based data center migrations. As a Lead DBA/Principal Software Engineer, you will be involved in designing and implementing how we perform those migrations, testing those migrations, and then performing them with a “no surprises in production” mindset. In addition, you will have a major role in evolving the infrastructure as code and configuration management we use to both keep the lights on for our existing on-prem databases and transition them into the cloud. This is a business-facing role, and as such, significant leadership and communication experience is required.
What you’ll do:
-
Infrastructure Automation: Design, implement, and maintain automated infrastructure provisioning and configuration management using Python, Ansible, and Typescript CDK to ensure consistency and scalability.
- Strong programming skills in these areas is a must have.
-
Monitoring and Alerting: Set up monitoring and logging systems to proactively detect and address potential issues, ensuring optimal performance and reliability, in environments like on-prem Prometheus/Thanos, as well as Grafana Cloud and Loki.
-
Database Management: Manage hundreds of on-prem PostgreSQL databases, including performance tuning, backups, disaster recovery strategies, and their active/passive counterparts in AWS.
-
Collaboration: Work closely with cross-functional teams, including developers, system administrators, and technical managers, to improve the overall development and deployment processes, and keep everyone in sync as to deliverables and timelines.
-
Troubleshooting and Incident Management: Assist in identifying and resolving operational issues and participate in on-call rotations.
Skills & Experience:
-
At least 10 years of expertise in database operations with a focus on building and maintaining scalable infrastructures around data.
-
At least 5 years of working directly with Postgres at a Senior level
-
At least 5 years of experience in leadership and communicating with the business.
-
Strong problem-solving skills and the ability to work in a fast-paced, agile environment.
-
Strong proficiency in Python for scripting and automation tasks and with CDK for AWS deployments.
-
Solid understanding of AWS-based data management technologies.
-
Experience in configuration management using Ansible.
-
Experience with infrastructure as code using CDK.
-
Understanding of CI/CD tools like Jenkins, GitLab CI, and GitHub Actions.
-
Understanding of networking concepts such as load balancing and DNS is also a plus.
-
Knowledge of containerization technologies like Docker and container orchestration tools such as Kubernetes is a plus.
-
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience).
If you need a reasonable accommodation or support during the application or the recruiting process due to a medical condition or disability, please reach out to your individual recruiter or send an email to [email protected] and let us know the nature of your request. Please include the job requisition number in your message.
#LI-GW1