Eng-III – Senior
ML Data Engineer
Posted on 8/25/2023

501-1,000 employees

Media measurement & optimization software
United States
Experience Level
Desired Skills
Apache Spark
Apache Kafka
Data Analysis
Data Science
Data Structures & Algorithms
Google Cloud Platform
Apache Flink
Data & Analytics
  • Requires a Bachelors degree in Computer Science, Mathematics, or a related technical field plus 5 years of experience
  • Requires 5 years of experience with the following: Scripting language (Python); Developing data pipelines for at least terabyte volumes of data; Modeling, measuring, and analyzing complex data; Distributed data technologies for building efficient and large-scale data pipelines like Hadoop, MapReduce, Spark, Flink, Kafka; Deploying models, UDFs, or other custom algorithms into a batch process workflow
  • Break down product initiative requirements, identify dependencies and create implementation plans
  • Mentor individuals through detailed feedback during code reviews
  • Design and scale petabyte-scale data flows
  • Participate in design reviews and production reviews for new features, products, or pieces of infrastructure
  • Automate manual tasks from data science and create tools for data scientists to simplify future automation
  • Build and enhance current data warehousing architecture to provide insights and analytics to our internal and external clients
  • Develop and release via CI/CD and agile methodologies
  • Automate and maintain infrastructure builds in AWS/On-Prem/GCP to support applications running in Kubernetes (Terraform, Ansible, Chef)
  • Build shared components and/or frameworks that improve engineering productivity across the organization
  • Create and maintain documentation of services, tools, and frameworks
  • Play a key role in building the ETL/ELT stack to cleanse, transform and load data from different sources using multiple technologies
  • Ensure that data is easily discoverable and usable for data scientists and analysts across the company
  • Identify root causes of instability in a large-scale distributed system, across stacks 100% Telecommuting Permitted