Principal Data Engineer
Posted on 3/20/2023
Remote • United States
Experience Level
Desired Skills
Apache Spark
Apache Kafka
  • Experienced in writing scalable applications on distributed architectures with large volumes of data
  • Data driven, testing and measuring as much as you can
  • Eager to both review peer code and have your code reviewed
  • Comfortable on the command line and consider it an essential tool
  • Confident in SQL/pyspark, you know it, write smart queries, it's no big deal
  • Research and implement new technologies with a team of developers to execute strategies and implement solutions
  • Provide thought leadership, best practices on architecture and tooling, plus lead the establishment of standards for designing and implementing scalable data solutions
  • Solve complex problems related to the real-time discovery of large data to the scale of terabytes
  • Lead conversations with data product owners and business partners to ensure requirements translate into data engineering solutions, establish procedures and best practices for transforming and storing data
  • Lead requirements gathering around data pipeline automation improvements
  • Work with some of the most exciting open-source tools like Spark, Kafka, Hadoop, Docker, DataBricks and Airflow
  • Leverage distributed computing and serverless architecture such as AWS EMR & AWS Lambda, to develop pipelines for transforming data
  • Marvel at the speed with which your creation makes it into production
  • Produce peer reviewed quality software
  • Provide mentorship and guidance to data engineers
  • Proactively anticipating challenges to provide viable suggestions as SME and Technical lead
Desired Qualifications
  • 10+ years of strategic experience solving data and ETL problems directly with business partners
  • 8+ years of large volume of data distribution with complex mappings
  • 8+ years of experience with Databricks, PySpark and Spark-SQL (writing, testing, debugging spark routines)
  • 5+ years of experience architecting, building and maintaining complex, multi-component big data systems
  • Working knowledge of Airflow or other orchestration and SQL code management tools
  • : Experience with ETL, data pipeline scheduling technologies
  • : Experience on distributed architectures such as microservices SOA, RESTful APIS and data integration architectures

201-500 employees

Health data exchange platform
Company Overview
HealthVerity's mission is to transform the healthcare industry by creating a high-governance, privacy-compliant way for the healthcare industry to connect and exchange real-world data across the broadest ecosystem, enabling longitudinal patient journeys, frictionless access to data and better patient outcomes.
  • Medical, dental, & vision
  • 401k
  • Stock options
  • Flexible location
  • Generous PTO
  • Mentorship program
  • Professional development