About the Job & Shakudo
At Shakudo, we are building the world’s first operating system for data and AI. We use the term operating system in the truest sense of the word. Like iOS, Windows and Linux, Shakudo’s end-to-end OS offers ever-evolving, automatically operated, best-of-breed open-source components tailored to each business’s unique needs.
We are in search of a seasoned DevOps Engineer, equipped with extensive experience in Kubernetes, automation, CI/CD, and infrastructure management, to join our team. As a DevOps Engineer, you will be tasked with maintaining and improving our infrastructure’s reliability, scalability, and performance. In this pivotal role, you’ll be working closely with our solution engineering and product engineering teams, facilitating the smooth deployment and operation of our software products with customers.
Responsibilities:
- Operate, evolve, and maintain Shakudo deployments across our customer base using SOTA Kubernetes tooling.
- Serve as the central knowledge expert for cloud-managed services that interface with the Shakudo Kubernetes clusters.
- Execute and manage deployments, product update rollouts, and maintenance in collaboration with the engineering team.
- Monitor and troubleshoot software and system issues in the production environment, identifying opportunities for infrastructure improvement.
- Operate GPU servers and cloud node pools, ensuring consistent operation with minimal downtime
- Prepare and deliver technical demonstrations and presentations to internal teams, showcasing the effectiveness of our DevOps practices.
- Create and maintain technical documentation, tutorials, and demos to educate our team on DevOps practices and related technologies.
- Write engaging blog posts and social media content that highlights our platform capabilities and fosters engagement with the right audience.
- Stay updated with industry trends and advancements in DevOps, cloud computing, and data tooling.
Requirements:
- 3+ years of experience in a DevOps role with an in-depth focus on automation, Kubernetes, and infrastructure management.
- 3+ years of experience in managing and deploying cloud services, particularly with tools such as Kubernetes and Terraform
- Proficient in the following technologies: Kubernetes, Terraform, Helm, Python, CUDA, Go, AWS, GCP, Azure, PostgreSQL, Typescript.
- Bachelor’s degree in Computer Science or Engineering.
- Strong problem-solving skills with the ability to adapt to new technologies quickly.
- Excellent communication, presentation, and technical writing skills.
- Experience in creating documentation and content for technical audiences.
- Ability to work independently and as part of a team.
Shakudo is an equal opportunity employer and encourages candidates of all backgrounds to apply. We foster diversity and inclusivity and welcome applications from a broad range of backgrounds and experiences.