Principal Site Reliability Engineer @ Atlan | Simplify Jobs

INACTIVE

Full-Time

Principal Site Reliability Engineer

Sre

Posted on 11/30/2023

Atlan

201-500 employees

Active metadata platform for data insights

Data & Analytics

Consulting

Enterprise Software

AI & Machine Learning

Senior, Expert

Canada + 2 more

Category

DevOps & Infrastructure

Software Engineering

Required Skills

Kubernetes

Microsoft Azure

Python

Apache Kafka

Java

Postgres

AWS

Go

Redis

Cassandra

Google Cloud Platform

Requirements

Proven expertise in software development and engineering, with a strong emphasis on building large-scale distributed systems.
Proficiency in one of the commonly used programming languages for building distributed systems, such as Golang, Java, or Python.
Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP) and developing distributed systems using cloud services.
Strong expertise in container orchestration platforms, specifically Kubernetes. CKA certification is a plus.
Exceptional problem-solving skills and a passion for developing robust, scalable, and secure solutions.
Excellent communication skills to effectively collaborate with cross-functional teams.
Ability to share impactful tech stories, demonstrating the results of your technical contributions.

Responsibilities

Lead and drive platform-first initiatives, with a focus on scalability, reliability, and performance of our technology platform.
Design, build, and maintain robust infrastructure supporting our distributed systems, leveraging technologies such as Kubernetes, Kafka, Postgres, Cassandra, and Redis.
Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
Align the platform with customer needs and business goals by working closely with cross-functional teams.
Develop and maintain CI/CD pipelines for seamless deployment and release management.

About the Role

As the Principal SRE, you will be responsible for leading and driving platform-first initiatives to ensure the scalability, reliability, and performance of our technology platform. You will play a pivotal role in enhancing the availability, reliability, and performance of our critical systems and services.

What will you do?

Lead and drive platform-first initiatives, with a focus on scalability, reliability, and performance of our technology platform.
Design, build, and maintain robust infrastructure supporting our distributed systems, leveraging technologies such as Kubernetes, Kafka, Postgres, Cassandra, and Redis.
Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
Align the platform with customer needs and business goals by working closely with cross-functional teams.
Develop and maintain CI/CD pipelines for seamless deployment and release management.

What makes you a match?

Proven expertise in software development and engineering, with a strong emphasis on building large-scale distributed systems.
Proficiency in one of the commonly used programming languages for building distributed systems, such as Golang, Java, or Python.
Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP) and developing distributed systems using cloud services.
Strong expertise in container orchestration platforms, specifically Kubernetes. CKA certification is a plus.
Exceptional problem-solving skills and a passion for developing robust, scalable, and secure solutions.
Excellent communication skills to effectively collaborate with cross-functional teams.
Ability to share impactful tech stories, demonstrating the results of your technical contributions.

Atlan

View Company Profile

Atlan is an active metadata platform that offers data discovery, cataloging, lineage, and governance, integrating metadata from various sources to provide unified data insights. It facilitates a two-way movement of metadata, bringing context back into the tools and workflows used by data teams, and has been recognized as a Leader in Forrester Wave™️ and by Gartner.

Company Stage

Series C

Total Funding

$173.5M

Headquarters

Singapore, Singapore

Founded

2019

Growth & Insights

Headcount

6 month growth

↑ 0%

1 year growth

↑ 19%

2 year growth

↑ 20%

INACTIVE