Data Engineer
Posted on 3/29/2023
INACTIVE
Locations
San Jose, CA, USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
BigQuery
Data Structures & Algorithms
Google Cloud Platform
Git
Airflow
Pandas
REST APIs
SQL
Python
CategoriesNew
Data & Analytics
Requirements
- Bachelor's degree or higher in an engineering or technical field such as Computer Science, Physics, Mathematics, Statistics, Engineering, Business Administration, or similar or equivalent combination of education and experience
- 4+ Years' experience in a data engineering role supporting production systems
- 1+ years experience extracting data from REST APIs
- 1+ years experience managing a codebase in GitHub
- Previous experience developing ETL pipelines using technologies such as Airflow (preferable), Luigi, Oozie, Azkaban, etc
- Previous experience developing data models to support a data warehouse
- Experience manipulating and de-normalizing data in JSON format for storage in relational databases
- Experience with Google Cloud Platform or AWS cloud services
- (Preferred) Knowledge and experience with Kubernetes and/or Docker
- (Preferred) Advanced knowledge of SQL and experience working with relational databases. BigQuery experience is an extra plus
- Work revolves around objectives, projects and priorities, not hours; must be able to work weekends, holidays, and occasional overtime as needed
- Must be able to stand, walk, lift, sit, and bend for a majority of their work schedule
- Must be able to travel to other office locations
- Ability to use computer and calculator for 8 hours or more
- Must be 21 years of age or older
- Must comply with all legal or company regulations for working in the industry
- Selected candidate will be required to complete a post offer, pre-employment background check with the local law enforcement or San Jose Police Department
Responsibilities
- Assist with the implementation of new systems and updates to existing systems by leading the data strategy for each, assuring data integrity, value and access
- Establish best practices in our data engineering practice and strategy
- Develop appropriate data schemas and structures for use in downstream models/reports
- Develop data management and oversight program spanning dozens of source systems across all departments, creating new ETL pipelines and maintenance of existing ones, ensuring data richness and quality
- Engineer capacity and performance in addition to providing forecasting and future planning as well as review and consideration of technology trends
- Recommend and develop changes to source data structures/systems based on observations of data within the context of operational use
- Assemble large, complex data models to meet the needs of operational and strategic stakeholders
- Work closely with our in-house analysts to integrate SQL data models to a dependency tree
- Document and maintain our data lineage and data dictionary
- Other duties and responsibilities as assigned by management
Desired Qualifications
- 1+ years experience manipulating data using Python (experience with Pandas is a plus)