You are a passionate engineer with expertise in DevOps and ML Ops. You thrive in dynamic environments, working at the intersection of software development and machine learning. Your technical skills in cloud platforms, containerization, and automation tools are complemented by your ability to collaborate with cross-functional teams. You excel at problem-solving, have strong scripting skills, and are familiar with modern DevOps practices. As a proactive and self-motivated individual, you effectively communicate complex technical concepts to various stakeholders. Your commitment to security, scalability, and reliability ensures efficient and secure systems. If you’re excited about innovative projects and driving digital transformation, you’re the perfect fit for our team.
Responsibilities
- Design, build, and maintain scalable and reliable infrastructure for machine learning and software applications.
- Develop and implement CI/CD pipelines to automate the deployment of ML models and software applications.
- Monitor and manage the performance, availability, and security of ML models and applications in production.
- Collaborate with data scientists and software engineers to streamline the development and deployment process.
- Implement and manage containerization technologies (Docker, Kubernetes) to ensure efficient resource utilization.
- Automate infrastructure provisioning and configuration using tools like Terraform, Ansible, or similar.
- Ensure best practices for version control, testing, and documentation are followed.
- Troubleshoot and resolve issues related to infrastructure, deployment, and performance.
- Stay up-to-date with the latest industry trends and technologies in DevOps and ML Ops.
Qualifications:
- 3+ years of experience in DevOps, ML Ops, or a related role.
- Strong knowledge of cloud platforms (AWS, Azure, GCP) and cloud-native services.
- Experience with CI/CD tools such as GitHub Actions, GitLab, Jenkins, or similar tools.
- Proficiency in containerization technologies (Docker, Kubernetes).
- Familiarity with infrastructure as code (IaC) tools like Terraform, Ansible, or CloudFormation.
- Solid understanding of software development lifecycle (SDLC) and Agile methodologies.
- Experience with monitoring and logging tools (Prometheus, Grafana, ELK stack).
- Strong scripting skills in Python, Bash, or similar languages.
- Excellent problem-solving skills and attention to detail.
- Strong communication and collaboration skills.
Nice to Have:
- Experience with ML frameworks and libraries (TensorFlow, PyTorch, Scikit-learn).
- Knowledge of data pipeline tools (Apache Airflow, Luigi).
- Experience with model serving and monitoring tools (Kubeflow, MLflow, Seldon).
- Familiarity with security best practices in DevOps and ML Ops.
- Experience working on software development processes for a regulated environment (aerospace, medical, automotive, etc.)
- Experience with building scalable machine learning model training infrastructure in the cloud