Full-Time

Software Engineer

Machine Learning Infrastructure

Updated on 4/19/2025

DatologyAI

DatologyAI

11-50 employees

Automated data curation for AI training

Compensation Overview

$180k - $250k/yr

Senior

Company Does Not Provide H1B Sponsorship

San Carlos, CA, USA

In-office 4 days a week; relocation assistance for employees moving to the Bay Area.

Category
Applied Machine Learning
AI & Machine Learning
Software Engineering
Required Skills
Kubernetes
Python
Apache Spark
Terraform
Linux/Unix
Data Analysis
Requirements
  • 5+ years of experience
  • Have meaningful experience with leading and building production ML infrastructure and platforms that deliver on major product initiatives.
  • Proficiency in Python and in the most commonly used tools in the infrastructure space: Linux, Kubernetes, Terraform / Pulumi, etc
  • Strong knowledge of hardening cloud native and especially K8s workloads.
  • Experience maintaining a high-quality bar for design, correctness, and testing.
  • Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed
  • Own problems end-to-end and are willing to pick up whatever knowledge you're missing to get the job done.
  • Experience running data-processing workloads in k8s (e.g spark on k8s)
Responsibilities
  • Architect, build and maintain the infrastructure that ensures highly available GPU workloads for training-purposes
  • Troubleshoot and resolve issues across GPU resources, networking, OS, drivers, and cloud environments, automate detection and recovery of such issues
  • Design, build, and maintain the infrastructure that powers our data curation product.
  • Partner with researchers and engineers to bring new features and research capabilities to our customers
  • Ensure that our infrastructure and systems are reliable, secure, and worthy of our customers' trust.

DatologyAI specializes in automated data curation tools that enhance the training of Generative AI models. Its technology automatically selects high-quality data while removing irrelevant or harmful data points, which improves the accuracy and performance of AI models and reduces training costs. Clients, including tech companies and research institutions, can easily integrate these tools into their existing data systems, allowing for scalable AI capabilities. DatologyAI stands out due to its award-winning technology and support for immigrant founders, backed by expertise from Carnegie Mellon University. The company's goal is to help businesses train better AI models more efficiently and cost-effectively.

Company Size

11-50

Company Stage

Series A

Total Funding

$57.7M

Headquarters

Redwood City, California

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • Rising demand for data curation tools as AI models grow in complexity.
  • Opportunities in AI ethics and bias reduction align with industry trends.
  • Expansion into non-tech industries increases potential client base.

What critics are saying

  • Competition from established AI companies investing in data curation.
  • Over-reliance on venture capital funding may lead to financial instability.
  • Emerging privacy regulations could limit data curation capabilities.

What makes DatologyAI unique

  • DatologyAI specializes in automated data curation for GenAI model training.
  • Their technology removes redundant and harmful data, enhancing AI model accuracy.
  • Integration with existing infrastructures is seamless, requiring minimal code adjustments.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Company Match

Unlimited Paid Time Off

Annual Wellness Stipend

Annual Learning and Development Stipend

Relocation Assistance

Growth & Insights and Company News

Headcount

6 month growth

-4%

1 year growth

11%

2 year growth

66%
SiliconANGLE
May 9th, 2024
DatologyAI raises $46M to streamline AI model training data diets

DatologyAI raises $46M to streamline AI model training data diets - SiliconANGLE

Datology AI
Feb 23rd, 2024
Introducing DatologyAI — Making models better through better data, automatically

Models are what they eat. AI models trained on large-scale datasets have demonstrated jaw-dropping abilities and have the power to transform every aspect of our daily lives, from work to play. This massive leap in capabilities has largely been driven by corresponding increases in the amount of data we train models on, shifting from millions of data points several years ago to billions or trillions of data points today. As a result, these models are a reflection of the data on which they’re train

SiliconANGLE
Feb 23rd, 2024
DatologyAI raises $11.65M to automate data curation for more efficient AI training

DatologyAI raises $11.65M to automate data curation for more efficient AI training.

TechCrunch
Feb 22nd, 2024
DatologyAI is building tech to automatically curate AI training datasets | TechCrunch

A new startup, DatologyAI, claims to be able to automatically curate the massive data sets on which AI models train.