Eng-III – Senior
ML Data Engineer
Posted on 8/25/2023
Locations
United States
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Agile
Apache Spark
AWS
Apache Kafka
Data Analysis
Data Science
Data Structures & Algorithms
Google Cloud Platform
Hadoop
Apache Flink
Python
CategoriesNew
Data & Analytics
Requirements
- Requires a Bachelors degree in Computer Science, Mathematics, or a related technical field plus 5 years of experience
- Requires 5 years of experience with the following: Scripting language (Python); Developing data pipelines for at least terabyte volumes of data; Modeling, measuring, and analyzing complex data; Distributed data technologies for building efficient and large-scale data pipelines like Hadoop, MapReduce, Spark, Flink, Kafka; Deploying models, UDFs, or other custom algorithms into a batch process workflow
Responsibilities
- Break down product initiative requirements, identify dependencies and create implementation plans
- Mentor individuals through detailed feedback during code reviews
- Design and scale petabyte-scale data flows
- Participate in design reviews and production reviews for new features, products, or pieces of infrastructure
- Automate manual tasks from data science and create tools for data scientists to simplify future automation
- Build and enhance current data warehousing architecture to provide insights and analytics to our internal and external clients
- Develop and release via CI/CD and agile methodologies
- Automate and maintain infrastructure builds in AWS/On-Prem/GCP to support applications running in Kubernetes (Terraform, Ansible, Chef)
- Build shared components and/or frameworks that improve engineering productivity across the organization
- Create and maintain documentation of services, tools, and frameworks
- Play a key role in building the ETL/ELT stack to cleanse, transform and load data from different sources using multiple technologies
- Ensure that data is easily discoverable and usable for data scientists and analysts across the company
- Identify root causes of instability in a large-scale distributed system, across stacks 100% Telecommuting Permitted