Principal Site Reliability Engineer
Sre
Posted on 11/30/2023
INACTIVE
Atlan

201-500 employees

Data collaboration workspace
Company Overview
Atlan is on a mission to help democratize enterprise data. The company is building a collaboration platform for data teams—allowing them to truly democratize both internal and external data, while automating repetitive tasks.
AI & Machine Learning
Data & Analytics
B2B

Company Stage

Series B

Total Funding

$68.9M

Founded

2019

Headquarters

, New York

Growth & Insights
Headcount

6 month growth

13%

1 year growth

20%

2 year growth

145%
Locations
Canada • Remote in USA • United States
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Kubernetes
Microsoft Azure
Python
Apache Kafka
Java
Postgres
AWS
Go
Redis
Cassandra
Google Cloud Platform
CategoriesNew
DevOps & Infrastructure
Software Engineering
Requirements
  • Proven expertise in software development and engineering, with a strong emphasis on building large-scale distributed systems.
  • Proficiency in one of the commonly used programming languages for building distributed systems, such as Golang, Java, or Python.
  • Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP) and developing distributed systems using cloud services.
  • Strong expertise in container orchestration platforms, specifically Kubernetes. CKA certification is a plus.
  • Exceptional problem-solving skills and a passion for developing robust, scalable, and secure solutions.
  • Excellent communication skills to effectively collaborate with cross-functional teams.
  • Ability to share impactful tech stories, demonstrating the results of your technical contributions.
Responsibilities
  • Lead and drive platform-first initiatives, with a focus on scalability, reliability, and performance of our technology platform.
  • Design, build, and maintain robust infrastructure supporting our distributed systems, leveraging technologies such as Kubernetes, Kafka, Postgres, Cassandra, and Redis.
  • Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
  • Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
  • Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
  • Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
  • Align the platform with customer needs and business goals by working closely with cross-functional teams.
  • Develop and maintain CI/CD pipelines for seamless deployment and release management.