Full-Time

Data Engineer

Confirmed live in the last 24 hours

Cybersyn

11-50 employees

Data-as-a-service for third-party economic information

Data & Analytics

Financial Services

Mid

New York, NY, USA

Required Skills

Python

SQL

Snowflake

Requirements

Experience with Snowflake
Experience with query optimization
Experience in Python and SQL
Experience working with multiple datasets
Experience with dbt and orchestrator systems
Experience building and operating data pipelines

Responsibilities

Helping with data ingestion pipelines
Implementing research and statistical models in Snowflake
Tuning Snowflake for performance and cost optimization
Providing infrastructure guidance for Snowflake
Providing production support for Data Warehouse issues
Taking end-to-end ownership of work

About Cybersyn

Cybersyn is a new DaaS (data-as-a-service) company, backed by Sequoia, Coatue, and Snowflake. Our mission is to make the world’s economic data transparent to governments, businesses, and entrepreneurs and enable a new generation of decision makers. We acquire unique data assets (companies, licenses, data rights, consumer dividends) and build derived products on top of that, focusing on measuring what consumers and businesses are spending money on. You can think of Cybersyn as a cross between an investment firm and a technology company focused on data: if we are successful, we will disrupt the traditional market intelligence space. The reward is great - if we are successful, we can disrupt an industry worth $100Bs and build SimCity for the real world.

We have already released a fair number of public datasets that we have cleaned, restructured and made joinable on the Snowflake Marketplace.

View our data products here.
Explore the sources we integrate, and the associated data products here.

About the role:

Cybersyn is looking for an experienced engineer to help us refine our technology stack for our data science and product team and implement ingestion pipelines of public domain and private data sources. We are looking for someone who is passionate about the Snowflake Data Cloud and optimizing costs and workloads, in particular. This is the perfect role for someone who loves to tune databases, thinks about cost-compute optimization, and knows their way around a query plan.

What you will do:

Help get data from wherever it is to where we need it (in Snowflake): in practice, this often means writing jobs to extract, download, or transform data as efficiently as possible. You need to worry about compute efficiency and also care about building some context for what the data actually is.
Take research and statistical models and pipelines and implement them in Snowflake in an efficient way that meets time SLA requirements while minimizing costs
Tune Snowflake for performance and cost optimization
Provide infrastructure guidance of Snowflake capabilities to accommodate business/technical use cases
Provide production support for Data Warehouse issues such data load problems, transformation translation problems, query optimization
Take end-to-end ownership of your work and enjoy working with different functions across the company

Who you are:

Experience with Snowflake is requisite
Experience with query optimization is required. You are comfortable in the Snowflake Query Profiler. Snowflake micro-partitions, sortkeys, query acceleration, and search optimization service should all be terms that you are familiar with and ready to discuss.
Experience in Python and SQL is requisite
Experience working with multiple (external) datasets, cleaning, joining, and munging data; experience working with public data sources (ie. US Census, ACS Survey) a huge plus
Experience with dbt and orchestrator systems (Dagster, Prefect, Mage, Kestra, or some equivalent) is highly valued
Experience building and operating data pipelines for real customers in production systems

What you get out of it:

Ability to shape Cybersyn’s initial technology decisions
Access to some of the most interesting and largest economic data in the world, including real-time spending, transaction, clickstream data from both third-party and first-party sources.
- Much of our data is not available to any other third parties.
- Our system is built with heterogeneous data sources in mind: we are not working on data from a single product or theme, but data from governments, payment processing systems (think bank records), mobile devices and apps, and SaaS exhaust (think data B2B SaaS collects)
Fast moving culture, lots of responsibility and autonomy from day 1.

Cybersyn

View

Website

View Company Profile

Cybersyn offers a data-as-a-service product providing fast, granular third-party economic data across various sectors, enabling informed decision-making. The company focuses on delivering analytics-ready data directly to clients' data warehouses without the need for engineering work, simplifying the process of leveraging external data for sales forecasting, pricing optimization, and personalization.

Company Stage

Series A

Total Funding

$62.9M

Headquarters

New York, New York

Founded

2022

Growth & Insights

Headcount

6 month growth

↑ 7%

1 year growth

↑ 87%

2 year growth

↑ 200%