About Cybersyn
Cybersyn is a new DaaS (data-as-a-service) company, backed by Sequoia, Coatue, and Snowflake. Our mission is to make the world’s economic data transparent to governments, businesses, and entrepreneurs and enable a new generation of decision makers. We acquire unique data assets (companies, licenses, data rights, consumer dividends) and build derived products on top of that, focusing on measuring what consumers and businesses are spending money on. You can think of Cybersyn as a cross between an investment firm and a technology company focused on data: if we are successful, we will disrupt the traditional market intelligence space. The reward is great - if we are successful, we can disrupt an industry worth $100Bs and build SimCity for the real world.
We have already released a fair number of public datasets that we have cleaned, restructured and made joinable on the Snowflake Marketplace.
View our data products here.
Explore the sources we integrate, and the associated data products here.
About the role:
Cybersyn is looking for an experienced engineer to help us refine our technology stack for our data science and product team and implement ingestion pipelines of public domain and private data sources. We are looking for someone who is passionate about the Snowflake Data Cloud and optimizing costs and workloads, in particular. This is the perfect role for someone who loves to tune databases, thinks about cost-compute optimization, and knows their way around a query plan.
What you will do:
Help get data from wherever it is to where we need it (in Snowflake): in practice, this often means writing jobs to extract, download, or transform data as efficiently as possible. You need to worry about compute efficiency and also care about building some context for what the data actually is.
Take research and statistical models and pipelines and implement them in Snowflake in an efficient way that meets time SLA requirements while minimizing costs
Tune Snowflake for performance and cost optimization
Provide infrastructure guidance of Snowflake capabilities to accommodate business/technical use cases
Provide production support for Data Warehouse issues such data load problems, transformation translation problems, query optimization
Take end-to-end ownership of your work and enjoy working with different functions across the company
Who you are:
Experience with Snowflake is requisite
Experience with query optimization is required. You are comfortable in the Snowflake Query Profiler. Snowflake micro-partitions, sortkeys, query acceleration, and search optimization service should all be terms that you are familiar with and ready to discuss.
Experience in Python and SQL is requisite
Experience working with multiple (external) datasets, cleaning, joining, and munging data; experience working with public data sources (ie. US Census, ACS Survey) a huge plus
Experience with dbt and orchestrator systems (Dagster, Prefect, Mage, Kestra, or some equivalent) is highly valued
Experience building and operating data pipelines for real customers in production systems
What you get out of it:
Ability to shape Cybersyn’s initial technology decisions
Access to some of the most interesting and largest economic data in the world, including real-time spending, transaction, clickstream data from both third-party and first-party sources.
Much of our data is not available to any other third parties.
Our system is built with heterogeneous data sources in mind: we are not working on data from a single product or theme, but data from governments, payment processing systems (think bank records), mobile devices and apps, and SaaS exhaust (think data B2B SaaS collects)
Fast moving culture, lots of responsibility and autonomy from day 1.