Full-Time

Senior Data Scientist / Machine Learning Engineer

GenAI & LLM

Posted on 11/15/2024

Databricks

Databricks

5,001-10,000 employees

Unified data platform for analytics and AI

Data & Analytics
AI & Machine Learning

Compensation Overview

$124.8k - $220.8kAnnually

+ Annual Performance Bonus + Equity

Senior

Remote in USA

Strong preference for candidates in the Bay Area, but open to other locations in the U.S.

Category
Applied Machine Learning
Natural Language Processing (NLP)
AI & Machine Learning
Data & Analytics
Required Skills
Microsoft Azure
Data Science
Tensorflow
Pytorch
Apache Spark
AWS
Pandas
LangChain
Databricks
Google Cloud Platform
Requirements
  • Experience building Generative AI applications, including RAG, agents, text2sql, fine-tuning, and deploying LLMs, with tools such as HuggingFace, Langchain, and OpenAI
  • 5+ years of hands-on industry data science experience, leveraging typical machine learning and data science tools including pandas, scikit-learn, and TensorFlow/PyTorch
  • Experience building production-grade machine learning deployments on AWS, Azure, or GCP
  • Graduate degree in a quantitative discipline (Computer Science, Engineering, Statistics, Operations Research, etc.) or equivalent practical experience
  • Experience communicating and/or teaching technical concepts to non-technical and technical audiences alike
  • Passion for collaboration, life-long learning, and driving business value through ML
  • [Preferred] Experience working with Databricks & Apache Spark to process large-scale distributed datasets
Responsibilities
  • Develop LLM solutions on customer data such as RAG architectures on enterprise knowledge repos, querying structured data with natural language, and content generation
  • Build, scale, and optimize customer data science workloads and apply best in class MLOps to productionize these workloads across a variety of domains
  • Advise data teams on various data science such as architecture, tooling, and best practices
  • Present at conferences such as Data+AI Summit
  • Provide technical mentorship to the larger ML SME community in Databricks
  • Collaborate cross-functionally with the product and engineering teams to define priorities and influence the product roadmap

Databricks provides a platform that combines data lakes and data warehouses into a single architecture known as lakehouse. This platform allows organizations to efficiently manage, analyze, and gain insights from their data. It caters to a variety of users, including data engineers, data scientists, and business analysts, across industries like finance, healthcare, and technology. The platform features automated ETL processes, secure data sharing, and high-performance analytics, and it also supports machine learning and AI workloads for building and deploying models. Databricks operates on a subscription-based model, generating revenue through client subscriptions and professional services. The company's goal is to streamline data management and analytics, making it easier for organizations to leverage their data effectively.

Company Stage

Growth Equity (Venture Capital)

Total Funding

$3.9B

Headquarters

San Francisco, California

Founded

2013

Growth & Insights
Headcount

6 month growth

8%

1 year growth

26%

2 year growth

78%
Simplify Jobs

Simplify's Take

What believers are saying

  • The $1 billion acquisition of Tabular is likely to enhance Databricks' data management capabilities and market reach.
  • The development and launch of the DBRX generative AI model, with a $10 million investment, underscores Databricks' dedication to leading in AI technology.
  • High-profile investments from figures like Nancy Pelosi indicate strong confidence in Databricks' growth potential.

What critics are saying

  • The integration of Tabular's team and technology could face challenges, potentially disrupting operations.
  • The competitive landscape in AI and data analytics is intense, with major players like Google and Microsoft posing significant threats.

What makes Databricks unique

  • Databricks' acquisition of Tabular, founded by the creators of Apache Iceberg, strengthens its position in the open lakehouse market.
  • The launch of DBRX, an open-source LLM that outperforms GPT-3.5 and Llama 2, showcases Databricks' commitment to cutting-edge AI innovation.
  • Strategic partnerships, such as with AVEVA for industrial AI, highlight Databricks' ability to integrate and enhance diverse technological ecosystems.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Extended health care including dental and vision

Life/AD&D and disability coverage

Equity awards

Flexible Vacation

Gym reimbursement

Annual personal development fund

Work headphones reimbursement

Employee Assistance Program (EAP)

Business travel accident insurance

Paid Parental Leave