Facebook pixel

Data Engineer
Posted on 10/26/2022
INACTIVE
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Apache Spark
Apache Kafka
Data Analysis
Data Structures & Algorithms
Hadoop
Java
Scala
SQL
Python
Yarn
Requirements
  • 2+ years of relevant industry experience
  • Advanced working knowledge of SQL, relational databases, query authoring, ideally in a variety of flavors (in our team alone we deal with MariaDB, HiveQL, CassandraQL, Spark SQL and Presto)
  • Experience with one or more programming languages such as Python, Scala, and Java
  • Experience building data pipelines using tools such as Airflow, Spark, Gobblin, Oozie, Yarn
  • Familiarity with stream processing systems using Kafka, Spark streaming and/or Flink
  • Excellent written and verbal communication skills
  • Strong interpersonal and collaboration skills
  • BS or MS degree, preferably in Computer Science, or equivalent work experience
  • Experience with Hadoop
  • Understanding of related disciplines including Machine Learning, Statistics, Privacy and Algorithms
  • Experience working with site reliability engineers
Responsibilities
  • Integrating data from multiple sources to gain insights in areas such as content, traffic, editors, readership and fundraising
  • Building scalable data pipelines in collaboration with other data engineers as well as teams across the foundation including product analytics, platform engineering, survey, research and machine learning teams
  • Designing the shared data platform that supports use cases for critical aspects of the Wikimedia mission: harassment prevention, image classification, bot detection, DDoS attacks flagging and many more
  • Building and maintaining public metrics and datasets
  • Implementing data quality monitoring that alerts the team of possible data issues
  • Implementing a data governance and lineage solution for all Wikimedia data
Wikimedia Foundation

501-1,000 employees

Nonprofit charitable organization
Company Overview
The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally.