Full-Time

Senior/Staff ML Engineer

Llamaparse

Confirmed live in the last 24 hours

LlamaIndex

LlamaIndex

51-200 employees

Connects data sources to large language models

Compensation Overview

$100k - $250k/yr

+ Equity Compensation

Senior

San Francisco, CA, USA

Remote opportunities for exceptional talent across the U.S. or Switzerland.

Category
Natural Language Processing (NLP)
AI & Machine Learning
Required Skills
Python
Keras
Natural Language Processing (NLP)
Requirements
  • 3+ years of experience
  • Deep expertise in Python and at least one ML Framework (Torch/Keras/etc.)
  • Strong ML engineering background with expertise in building and deploying models in production
  • Experience with curating and building training sets
  • Experience with document processing technologies (e.g., PDF parsing, OCR, layout analysis)
  • Strong understanding of modern AI/ML techniques, particularly in document understanding and NLP
  • Track record of executing with high intensity in fast-paced environments
Responsibilities
  • Develop and optimize machine learning models for document structure understanding, table extraction, and layout analysis
  • Build and maintain robust APIs and infrastructure to support high-volume document processing
  • Collaborate with the broader AI team to improve RAG pipeline integration and document preprocessing
  • Drive technical decisions while balancing speed, quality, and maintainability
  • Contribute to both our open-source framework and enterprise offering
Desired Qualifications
  • Experience with production API design and implementation
  • Software engineering background with expertise in Typescript
  • Experience with building file parsers / OCR
  • Experience with computer vision or document understanding models
  • Familiarity with LLM applications, particularly in document processing contexts
  • Contributions to open-source document processing or ML projects
  • Experience with ABS Document
  • Background in technical product development at fast-growing startups
  • Proven track record of shipping production ML systems

LlamaIndex.ai provides a data framework that enables businesses to connect their custom data sources to large language models (LLMs), which are AI systems capable of understanding and generating human-like text. The framework supports various types of data, including structured data from sources like Excel and SQL, semi-structured data from APIs such as Slack and Salesforce, and unstructured data like web pages and images. This versatility allows businesses of all sizes to gain insights from their data. Operating on a business-to-business (B2B) model, LlamaIndex likely generates revenue through a subscription service, offering clients ongoing access to its features. The company's goal is to empower businesses to leverage their data effectively, facilitating data-driven decision-making.

Company Size

51-200

Company Stage

Series A

Total Funding

$27.5M

Headquarters

San Francisco, California

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • Recent investments from Databricks and KPMG boost development and market reach.
  • AGNTCY collaboration positions LlamaIndex as a leader in AI agent interoperability.
  • $19M Series A funding supports team expansion and LlamaCloud service enhancement.

What critics are saying

  • Reliance on Google Cloud poses risks of service disruptions during outages.
  • Emergence of Google's AlphaEvolve could threaten LlamaIndex's competitive edge.
  • AGNTCY's open-source framework may challenge LlamaIndex's proprietary solutions.

What makes LlamaIndex unique

  • LlamaIndex connects diverse data sources to large language models for unique insights.
  • It supports structured, semi-structured, and unstructured data integration, enhancing versatility.
  • The company offers a flexible B2B subscription model, ensuring steady revenue and scalability.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Company Equity

Meal Benefits

Growth & Insights and Company News

Headcount

6 month growth

-9%

1 year growth

-4%

2 year growth

-6%
VentureBeat
Jun 12th, 2025
Cloud Collapse: Replit And Llamaindex Knocked Offline By Google Cloud Identity Outage

Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn moreDays after OpenAI and Google Cloud announced a partnership to support the growing use of generative AI platforms, much of the AI-powered web and tools went down due to an outage of the leading cloud providers.Google Cloud Service Platform (GCP) and some Cloudflare services began experiencing issues around 10:00 a.m. PT today, affecting several AI development tools and data storage services, including ChatGPT and Claude, as well as a variety of other AI platforms.We are aware of a service disruption to some Google Cloud services and we are working hard to get you back up and running ASAP.Please view our status dashboard for the latest updates: https://t.co/sT6UxoRK4R — Google Cloud (@googlecloud) June 12, 2025A GCP spokesperson confirmed the outage to VentureBeat, urging users to check its public status dashboard.GCP said affected services include API Gateway, Agent Assist, Cloud Data Fusion, Contact Center AI Platform, Google App Engine, Google BigQuery, Google Cloud Storage, Identity Platform, Speech-to-Text, Text-to-Speech and Vertex AI Search, among other tools. Google’s mobile development platform, Firebase, also went down.VentureBeat staffers had trouble accessing Google Meet, but other Google services on Workspace remained online.A Cloudflare spokesperson told VentureBeat only “a limited number of services at Cloudflare use Google Cloud and were impacted. We expect them to come back shortly

VentureBeat
May 17th, 2025
Google’S Alphaevolve: The Ai Agent That Reclaimed 0.7% Of Google’S Compute – And How To Copy It

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Google’s new AlphaEvolve shows what happens when an AI agent graduates from lab demo to production work, and you’ve got one of the most talented technology companies driving it.Built by Google’s DeepMind, the system autonomously rewrites critical code and already pays for itself inside Google. It shattered a 56-year-old record in matrix multiplication (the core of many machine learning workloads) and clawed back 0.7% of compute capacity across the company’s global data centers.Those headline feats matter, but the deeper lesson for enterprise tech leaders is how AlphaEvolve pulls them off. Its architecture – controller, fast-draft models, deep-thinking models, automated evaluators and versioned memory – illustrates the kind of production-grade plumbing that makes autonomous agents safe to deploy at scale.Google’s AI technology is arguably second to none. So the trick is figuring out how to learn from it, or even using it directly

Gulf Main Magazine
May 1st, 2025
LlamaIndex Gains Investments from Databricks, KPMG

LlamaIndex announced minority equity investments from Databricks and KPMG LLP to boost development and adoption of its LlamaCloud and LlamaParse platforms. These tools aid enterprises in building AI systems using unstructured data. Databricks and KPMG aim to enhance AI accessibility and innovation. Financial terms were not disclosed.

ChannelE2E
May 1st, 2025
LlamaIndex Secures Strategic Investments from Databricks and KPMG to Advance Enterprise AI Workflows

LlamaIndex has announced minority equity investments from Databricks and KPMG LLP, marking a key milestone in its mission to streamline the development of agentic AI solutions built on enterprise data.

Databricks
Apr 15th, 2025
Databricks Invests in LlamaIndex for AI

Databricks Ventures has invested in LlamaIndex to enhance the development of knowledge agents over enterprise data. Many companies struggle with building scalable AI applications due to the lack of a robust workflow framework for handling large volumes of unstructured data. LlamaIndex, integrated with MLFlow and Mosaic AI Vector Search, helps enterprises create production-ready knowledge agents to retrieve, synthesize, and act on complex data within the Databricks Data Intelligence Platform.