Facebook pixel

Streaming Data Engineer
Posted on 4/7/2022
INACTIVE
Locations
Remote • New York, NY, USA • United States
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Agile
Apache Spark
AWS
Data Science
Data Structures & Algorithms
Docker
Elasticsearch
GraphQL
JavaScript
JIRA
Kafka
Git
Java
MySQL
Postgres
Redis
Research
Scala
SQL
Kubernetes
Python
TypeScript
Writing
Requirements
  • Well-versed in using docker/Kubernetes to manage your own local development environment
  • Experience working and deploying in the AWS Cloud, especially its file and data-oriented services
  • Experience in developing, managing, and manipulating large, complex datasets
  • Experience working with data ingestion and transformation pipelines, either batch ETL or streaming
  • Familiarity with big data streaming technologies like Kafka, Kinesis, Flink, or Spark
  • Substantial experience working with data-oriented APIs, preferably using GraphQL
  • Deep experience with relational database technologies, like Postgres and MySQL, including writing and optimizing complex queries
  • Experience using non-relational database technologies like Cassandra, Dynamo, Athena, Elasticsearch, and Redis
  • Proficient coding in at least one language in addition to SQL like Scala, Java, Python, Go, Javascript or Typescript in the context of data-oriented problems
  • Experience applying agile software development methodology to enterprise data engineering, with tools like Git, JIRA, Travis and others
  • Bachelors in science, engineering, math, data or relevant field
  • Minimum of three (3) years working in software development and at least one (1) year of experience working in data engineering or data science
  • Must be located in the continental US
Responsibilities
  • Code, test, troubleshoot, document, deploy, maintain and optimize event-driven, distributed, stateful data pipelines using Flink and Kafka in a Kubernetes environment that integrate and analyze diverse sources of clinical, sensor and real-world data to support biomarker discovery
  • Contribute to the development and deployment of incremental online machine learning algorithms to solve out-of-core analytical problems
  • Develop tooling to support the use of a growing, cross-functional, multi-format type registry of schemas for streaming data algorithms and APIs
  • Support and enhance data APIs to make insights from streaming sources more accessible to patients, clinical research professionals and other stakeholders; and
  • Document and compare alternative data architecture and data modeling solutions
Desired Qualifications
  • Any experience using semantic database technologies like Neptune, Stardog, Anzograph or other triple stores is a plus
Medidata Solutions

1,001-5,000 employees

Life sciences digital transformation platform
Company Overview
Medidata's mission is to advance the development of new treatments. The company develops cloud-based clinical solutions.
Benefits
  • Unlimited Paid Time Off
  • Health & Wellness
  • Professional Development
  • Work/Life Balance
  • Maternity & Paternity Leave