Salary Range: 140000 to 160000 (Currency: USD) (Pay period: per-year-salary)
Summary
As a Sr. Site Reliability Engineer II, you are instrumental in helping make our Petabyte scale Kubernetes-centric ProArchive application resilient. This position will coordinate with multiple teams to develop a migration plan for various components and services as well as implement best practices for our tech stack. A person in this position will have a passion for getting things done for various functions, including automation, CI/CD, infra components, middleware, etc. You’ll work closely with our Dev Engineering, QA, and Platform Engineering groups to manage our current on-prem deployments and on-prem & cloud-native infrastructures.
How will you contribute?
- Help define technology choices, best practices and process for the team.
- Develop and maintain documentation standard for the team.
- Develop new tools and libraries for broader use by SaaS Operations and Engineering teams. Enable engineering teams to discover and understand problems quicker.
- Work with product architects and make suggestions for architectural changes and design platform component roadmaps.
- Act as a subject matter expert (SME) for components and functions desired. Develop the skill as required, to become SME for components in need.
- Assist engineering teams in deep troubleshooting and application code review to find opportunities to improve performance and scalability.
- Work closely with Engineering and peer SRE teams to design and use Smarsh coding standards and best practices.
- Respond to incidents coordinated by SRE and Incident Response teams. Act as a Incident Commander during incidents.
- Participate in escalation and off-hours on-call schedule.
- Adopt and embrace qualities of an SRE as defined in the team charter. Help set them for the rest of the team.
- Mentor and train junior members of the team. Design training curriculum for the team.
What will you bring?
- Minimum 7+ years industry experience.
- BS in CS or equivalent combination of education and experience.
- Strong experience operating Kubernetes in production environments – EKS Anywhere is preferred
- Experience with middleware systems (Kafka, AMQ, Redis, Memcache, etcd)
- Experience managing CI/CD systems (Flux, Concourse)
- Experience deploying and/or operating Observability stack (Splunk, Datadog, Grafana)
- Experience with large scale systems
- Familiarity with working with PostgreSQL and MongoDB
- Background working in a multi-platform environment (Linux, Windows)
- Familiarity of programming/scripting languages (ie. Python, Bash, PowerShell, Go, etc.)
- Familiarity with Agile/Scrum/Kanban methodologies
- Strong interpersonal skills with a can-do attitude and sense of urgency for a high growth/fast paced environment
- Curious mind, wanting to learn new technologies and share with others.
- The ability to think outside of the box to resolve issues and create solutions