Distributed Data Systems
Staff Software Engineer
Updated on 11/30/2023
Databricks

5,001-10,000 employees

Unified, open platform for enterprise data
Company Overview
Databricks is on a mission to simplify and democratize data and AI, helping data teams solve the world’s toughest problems. As the world’s first and only lakehouse platform in the cloud, Databricks combines the best of data warehouses and data lakes to offer an open and unified platform for data and AI.
Data & Analytics

Company Stage

Series I

Total Funding

$4.7B

Founded

2013

Headquarters

San Francisco, California

Growth & Insights
Headcount

6 month growth

17%

1 year growth

46%

2 year growth

119%
Locations
San Francisco, CA, USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Apache Spark
AWS
Data Science
Data Structures & Algorithms
Hadoop
Java
Microsoft Azure
Scala
SQL
CategoriesNew
AI & Machine Learning
Software Engineering
Requirements
  • BS in Computer Science, related technical field or equivalent practical experience.
  • Optional: MS or PhD in databases, distributed systems.
  • Comfortable working towards a multi-year vision with incremental deliverables.
  • Driven by delivering customer value and impact.
  • 5+ years of production level experience in either Java, Scala or C++.
  • Strong foundation in algorithms and data structures and their real-world use cases.
  • Experience with distributed systems, databases, and big data systems (Spark, Hadoop).
Responsibilities
  • Develop the de facto open source standard framework for big data.
  • Deliver reliable and high performance services and client libraries for storing and accessing humongous amount of data on cloud storage backends, e.g., AWS S3, Azure Blob Store.
  • Build the next generation distributed data storage and processing systems that can outperform specialized SQL query engines in relational query performance, yet provide the expressiveness and programming abstractions to support diverse workloads ranging from ETL to data science.
  • Make it simple and possible to orchestrate and operate tens of thousands of data pipelines. Provide a higher level abstraction for expressing data pipelines and enable customers to deploy, test & upgrade pipelines and eliminate operational burdens for managing and building high quality data pipelines.
  • Build the next generation query optimizer and execution engine that's fast, tuning free, scalable, and robust.