Site Reliability Engineer
Posted on 11/26/2022
INACTIVE
Locations
Toronto, ON, Canada
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
Bash
BigQuery
Google Cloud Platform
JIRA
Git
Linux/Unix
Management
MySQL
Postgres
SQL
Terraform
Python
Looker
Ansible
Requirements
- You are a learn a lot, not a know it all
- Post-secondary diploma, certificate or degree in an IT-related discipline or the wherewithal to have obtained such things
- Comfort with an outrageous number of acronyms
- Experience with GCP architecture and related products including: Datastream, BigQuery, Looker, Cloud Storage, Billing, Cloud Function, etc
- Experience with AWS architecture and related products including: EC2, RDS, ALB, VPC, VPC Peering, Multi-AR, EFS, EBS, Beanstalk, Auto Scale groups, direct connect, etc
- You have a good understanding of what the Well Architected Framework is and how it relates to EC2, BigQuery, GCS/S3, RDS, VPCs, NATs, and more
- 2-3 years Linux/UNIX systems administration experience, preferably in a LAMPJ environment
- Familiarity with CentOS/RHEL, MySQL, PostgreSQL, Reverse Proxies, Firewalls/NAT, HTTPS, SSL Certificates, SFTP, FTPS, DNS, SMTP
- Experience in configuration, implementation, and maintenance of SaaS platforms
- A fierce passion for availability, reliability, and short MTTR
- Well-formed experience with at least one scripting language (BASH, Python, etc.) We code on this team
- You know SQL and can help to debug MySQL and PostgreSQL Databases (slow queries, traces, killing queries)
- Experience with the practice of Infrastructure as Code and have done infrastructure coding (Ansible and Terraform)
- Familiarity with monitoring and metric collection systems
- Familiarity with information security best practices and tools
- Familiarity with backup strategies and tools
- Version Control Systems (Git)
- A belief that companies should be socially responsible
Responsibilities
- Day-to-day administration of Google Cloud Platform (GCP) data platforms with support for our legacy Amazon Web Service (AWS) environments
- Collaborate with our Data Engineers on remediating technical debts, implementing a new data platform on GCP, and helping to optimize existing AWS workflows and infrastructure in collaboration with Site Reliability Engineering (SRE) team
- Implement monitoring and logging solutions across all systems
- Adhere with Site Reliability Engineering principles on incident management and service level objectives
- Assist in implementation of security best practices and initiatives at all levels of the systems infrastructure
- Serve as a steward for the service life cycle for the AWS in collaboration with SRE team and GCP data platforms
- Troubleshoot issues that arise, document defects in JIRA, work with colleagues to resolve production issues
- Assist with sourcing and testing infrastructure enhancements before deployment
- Support workflow automation using configuration management and continuous deployment frameworks
- Work in an on-call rotation
- Be a great teammate!
Desired Qualifications
- You'll get that competitive salary, flexible health benefits, mental health support, a generous program, stock options, a hybrid office/home work environment and so much more
Donation & grant management platform