Data Engineer
Research
Posted on 9/21/2023
INACTIVE
Stability AI

51-200 employees

Develops open AI tools for collective intelligence applications
Company Overview
Stability AI stands out as a leader in the AI industry, offering open AI tools that leverage collective intelligence and augmented technology to empower individuals and businesses to reach their potential. The company's culture encourages active involvement in their rapidly expanding open software project, Stable Diffusion, fostering a collaborative environment for developers to create remarkable applications. Their competitive edge lies in their commitment to open-source development, which accelerates technical advancements and industry leadership.
AI & Machine Learning
Data & Analytics

Company Stage

Seed

Total Funding

$123.8M

Founded

2019

Headquarters

London, United Kingdom

Growth & Insights
Headcount

6 month growth

-23%

1 year growth

68%

2 year growth

176%
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
Computer Vision
Pytorch
Python
CategoriesNew
AI & Machine Learning
Data & Analytics
Requirements
  • Proven background within large scale distributed workloads
  • Experience with large scale data loading for machine learning training runs
  • Experience with cloud storage and file systems. AWS (S3) is strongly preferred, but open to other cloud platforms
  • Experience with Python + Pytorch, Deep learning, Computer Vision
  • Experience with multiprocessing and multithreading python workloads
  • Experience with parallel dataframe manipulation using Pyspark/Ray
  • Proficiency in HPC cluster management tools and technologies
  • Excellent communication skills to effectively collaborate with users, solve issues, and provide guidance
  • Attention to detail and the ability to document processes and solutions effectively
  • Nice to have: Experience with data loading stack (Webdataset, Torchdata, fsspec, AIstore)
Responsibilities
  • Clean, normalize, and preprocess data in a scalable, parallelizable way to prepare it for ingestion into our machine learning model training pipelines while ensuring of data quality
  • Building and maintaining highly scalable distributed workloads
  • Build data pipelines to ingest and process data (e.g. images and text) for feeding into ML models
  • AWS Resource Management
  • Keep up-to-date with papers / methods regarding how to improve data quality and/or curate data for Image, Video, LLMs etc