Full-Time

Data Infrastructure Engineer

US

Updated on 11/16/2024

Onehouse

Onehouse

51-200 employees

Data lakehouse solution for efficient data management

Data & Analytics
Enterprise Software
AI & Machine Learning

Compensation Overview

$215k - $250kAnnually

+ Equity Compensation

Junior, Mid

Remote in USA

Remote-friendly company with potential in-person requirements for some roles.

Category
DevOps & Infrastructure
Database Administration
Platform Engineering
Cloud Engineering
Required Skills
Kubernetes
Data Structures & Algorithms
Java
C/C++
Linux/Unix
Data Analysis
Requirements
  • Strong, object-oriented design and coding skills (Java and/or C/C++ preferably on a UNIX or Linux platform).
  • Experience with inner workings of distributed (multi-tiered) systems, algorithms, and relational databases.
  • You embrace ambiguous/undefined problems with an ability to think abstractly and articulate technical challenges and solutions.
  • An ability to prioritize across feature development and tech debt with urgency and speed.
  • An ability to solve complex programming/optimization problems.
  • An ability to quickly prototype optimization solutions and analyze large/complex data.
  • Robust and clear communication skills.
Responsibilities
  • As a foundational member of the Data Infrastructure team, you will productionize the next generation of our data tech stack by building the software and data features that actually process all of the data we ingest.
  • Accelerate our open source <> enterprise flywheel by working on the guts of Apache Hudi's transactional engine and optimizing it for diverse Onehouse customer workloads.
  • Act as a SME to deepen our teams' expertise on database internals, query engines, storage and/or stream processing.
  • Design new concurrency control and transactional capabilities that maximize throughput for competing writers.
  • Design and implement new indexing schemes, specifically optimized for incremental data processing and analytical query performance.
  • Design systems that help scale and streamline metadata and data access from different query/compute engines.
  • Solve hard optimization problems to improve the efficiency (increase performance and lower cost) of distributed data processing algorithms over a Kubernetes cluster.
  • Leverage data from existing systems to find inefficiencies, and quickly build and validate prototypes.
  • Collaborate with other engineers to implement and deploy, safely rollout the optimized solutions in production.

Onehouse.ai offers a data lakehouse solution that helps businesses manage and optimize their data efficiently. Their main product is a fully managed service that allows clients to organize various types of data without needing extensive technical resources. The platform supports different table formats like Apache Hudi, Apache Iceberg, and Delta Lake, making it adaptable to various data requirements. Onehouse.ai stands out from competitors with its usage-based pricing model, which can significantly lower data management costs compared to traditional cloud data warehouses. The company's goal is to simplify data management for businesses of all sizes, enabling them to scale their data operations while minimizing expenses.

Company Stage

Series B

Total Funding

$66.1M

Headquarters

San Francisco, California

Founded

2021

Growth & Insights
Headcount

6 month growth

8%

1 year growth

8%

2 year growth

87%
Simplify Jobs

Simplify's Take

What believers are saying

  • Securing $35M in Series B funding and launching new products enhances Onehouse.ai's ability to innovate and expand its market presence.
  • Partnerships with industry giants like Microsoft and Google for the OneTable project highlight Onehouse.ai's influence and potential for reshaping the cloud data lake landscape.
  • Winning the 2023 Digital Innovator Award from Intellyx underscores Onehouse.ai's leadership and recognition in the digital transformation space.

What critics are saying

  • The competitive landscape in data management and cloud computing is intense, with major players like Snowflake and Databricks posing significant challenges.
  • Reliance on open-source technologies may lead to slower adoption rates among enterprises wary of open-source solutions.

What makes Onehouse unique

  • Onehouse.ai's focus on open storage and interoperability with multiple table formats like Apache Hudi, Iceberg, and Delta Lake sets it apart from competitors who may lock clients into proprietary systems.
  • The usage-based pricing model offers a cost-effective alternative to traditional cloud data warehouses, reducing data management costs by 50% or more.
  • Onehouse.ai's automated data management features, such as clustering, compaction, and encryption, provide a seamless and optimized data experience without extensive engineering resources.

Help us improve and share your feedback! Did you find this helpful?