Senior Data Engineer
Etl, Databricks
Updated on 9/13/2023
Aegis Ventures

11-50 employees

Startup studio that partners with entrepreneurs and industry leaders.
Locations
New York, NY, USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Data Science
Python
CategoriesNew
AI & Machine Learning
Data & Analytics
Requirements
  • At least five years of data engineering experience
  • Experience using Databricks or a similar platform to architect data lakehouses
  • Experience with object-oriented/object function scripting languages (the team uses Python)
  • Experience building and optimizing 'big data' data pipelines, architectures, and data sets
  • Strong analytic skills related to working with unstructured datasets
  • Working knowledge of message queuing and stream processing
  • Experience building cloud services and the use of Infrastructure as Code tools
  • Working knowledge of data science and machine learning
  • Ability to work collaboratively in a cross-functional team
  • Strong project management and organizational skills
Responsibilities
  • Design and implement best practices for data governance and data pipeline architecture that supports data labeling and foundation model fine tuning
  • Collaborate with product and business stakeholders to understand their data needs and develop solutions that support their objectives
  • Build the infrastructure required for extraction, transformation, and loading (ETL) of data from a wide variety of data sources
  • Advising the engineering and management on the optimal strategy for cloud architecture and design
  • Support the research team to build repeatable data pipelines for their model development
  • Employ best practices to ensure that production deployments are secure, stable, well-monitored, and easy to troubleshoot
  • Support the engineering teams to put effective development tooling in place by establishing servers and managing continuous deployment pipelines ensuring CI/CD
  • Help to define and enforce policies for the usage of and access to cloud environments
  • Predict, monitor, and control cloud spend across the group of companies
  • Communicate complex technical concepts and recommendations to non-technical stakeholders in a clear and concise manner