Full-Time

AI/Machine Learning Engineer

Public Sector

Posted on 9/5/2025

Unstructured

Unstructured

51-200 employees

Open-source data preprocessing for unstructured data

No salary listed

Remote in USA

Remote

Occasional travel to North Carolina, Florida, and other CONUS locations.

US Top Secret Clearance Required

Category
AI & Machine Learning (2)
,
Required Skills
Machine Learning
Requirements
  • Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or a related technical field. Master’s or PhD a plus.
  • 4+ years of experience in AI/ML engineering, MLOPS, systems architecture, or similar technical roles
  • 2+ years of experience working with government networks and security requirements
  • An understanding of government security frameworks (FedRAMP, NIST 800-53, FISMA, DISA SRG) and how they apply to ML workloads
  • History of leading or delivering high-impact ML initiatives in enterprise or government environments; preference for those with articulable experience assessing performance of alternative models, architectures, and implementation strategies
  • A commitment to meeting the demanding engineering standards required to support national security and defense clients
  • A strong interest in being at the forefront of the AI revolution
  • TS Active Clearance required for the role + ability to travel
Responsibilities
  • Develop evaluation and assessment tools and frameworks to measure newly developed models for performance against key metrics across a wide domain of tasks and knowledge sets
  • Identify, propose, and implement modifications of existing models and model implementation frameworks to optimize for new tasks
  • Lead conceptualization of both traditional and agentic implementation strategies for cloud and on-premises model deployments within broader system architectures
  • Lead and optimize distributed ML workloads on multiple government cloud and non-cloud infrastructures
  • Align AI/ML deployments with FedRAMP, NIST 800-53, FISMA, and DISA SRG, maintaining strict security standards
  • Create reference architectures and deployment patterns to streamline ML adoption across government agencies
  • Translate mission objectives into ML-focused technical specifications and project plans
  • Apply advanced security controls and zero-trust architectures to protect ML pipelines and data
  • Continuously assess ML workloads for performance, cost, and security improvements, driving ongoing refinement.
Desired Qualifications
  • Master’s or PhD a plus
  • Background in Large Language Models (LLMs)
  • Foundation in computer vision, autonomy, sensor fusion, or core defense technologies, such as signals, electronic warfare, or cyber.

Unstructured.io provides tools for turning raw unstructured data into ML-ready formats. It delivers open-source libraries and APIs developers and data scientists use to build custom data-preprocessing pipelines for labeling, training, and production workflows. The pipelines support data from HTML, PDFs, CRM data, XML, PPTX, and DOCX, and can be orchestrated with machine learning models, cleaning scripts, and regular expressions, with easy integration to downstream services and strong data security. Users can publish their own APIs and format data for ingestion with various ML services, enabling scalable use of unstructured data. The goal is to help organizations extract value from unstructured data at scale by providing flexible, reusable preprocessing tools.

Company Size

51-200

Company Stage

Series B

Total Funding

$65M

Headquarters

San Francisco, California

Founded

2022

Simplify Jobs

Simplify's Take

What believers are saying

  • $2M AFWERX contract builds Air Force multimodal AI data pipelines.
  • Raised $25M from Madrona for LLM data solutions expansion.
  • 30+ connectors standardize multi-source ETL without custom code.

What critics are saying

  • LlamaIndex v0.10 erodes market share in 6-12 months.
  • LangChain 0.3 captures users via agentic workflow integration.
  • Open-source forks commoditize partitioning, slashing subscriptions.

What makes Unstructured unique

  • Unstructured supports 70+ file types with multimodal processing for AI pipelines.
  • FedRAMP High authorization enables secure federal agency deployments.
  • Partners with Teradata for native Enterprise Vector Store integration.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Remote Work Options

Unlimited Paid Time Off

Home Office Stipend

Health Insurance

Dental Insurance

Vision Insurance

Professional Development Budget

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

-2%

2 year growth

3%
Feedzai
Mar 24th, 2026
The most innovative data science companies of 2026.

The most innovative data science companies of 2026. March 24, 2026 Why Unstructured, Feedzai, Synchron, and Chalk are among Fast Company's Most Innovative Companies in data science for 2026.

Yahoo Finance
Mar 9th, 2026
Unstructured partners with Teradata to embed AI data processing natively in Enterprise Vector Store

Unstructured has partnered with Teradata to embed its data processing platform natively inside Teradata Enterprise Vector Store, enabling enterprises to transform unstructured content into AI-ready data without external tools. The integration will be available to eligible Teradata customers from April 2026. The partnership allows automatic ingestion and processing of documents, PDFs, images, video and audio directly within Teradata Enterprise Vector Store. Unstructured's preprocessing capabilities support over 70 file types, converting them into structured data and embeddings whilst maintaining the same governance and security standards as Teradata's structured analytics. The integration addresses a critical challenge, as roughly 80% of enterprise data exists in formats AI systems cannot natively use. It supports hybrid deployment across AWS, Azure, GCP, on-premises and air-gapped environments, particularly benefiting regulated industries like financial services, healthcare and government.

Business Wire
Feb 18th, 2026
Unstructured wins $2M AFWERX contract to build multimodal AI data pipelines for US Air Force testing

Unstructured has been awarded a $2 million Tactical Funding Increase contract by AFWERX in partnership with the U.S. Air Force Test Center's 96th Test Wing. The contract will develop advanced multimodal data pipelines for generative AI-enabled testing tools and establish test and evaluation frameworks for AI applications across the Air Force. The technology will enable the Air Force to process complex test data formats including charts, diagrams, images, audio, video and telemetry, which current AI tools struggle to access. Unstructured's solution will allow personnel to query and analyse information through AI-powered assistants whilst reducing processing costs and storage requirements. The company will also work with AFTC to develop frameworks measuring accuracy, speed and reliability of AI tools, accelerating test cycles and reducing redundant analysis.

The AI Journal Ltd
Dec 12th, 2025
Unstructured Secures FedRAMP High Authorization to Deliver AI-Ready Data to Federal Agencies and Partners

Unstructured secures FedRAMP High authorization to deliver ai-ready data to federal agencies and partners. SACRAMENTO, Calif. - (BUSINESS WIRE) - Unstructured, the leader in AI-ready data orchestration, today announced it has achieved FedRAMP High authorization. This milestone affirms Unstructured's commitment to delivering secure, scalable, and mission-ready solutions to US government agencies and industry partners, including those with the most stringent data security and compliance requirements. With this authorization, Unstructured becomes one of the few AI infrastructure companies authorized to operate at the FedRAMP High baseline. "FedRAMP High is more than a compliance milestone - it's our gateway to accelerating outcomes and unlocking data preparation cost savings for our public sector customers and partners," said Brian Raymond, Founder and CEO of Unstructured. "With this authorization, government users and industry partners can deploy Unstructured's enterprise-grade solution to get their data AI-ready and focus on delivering production-ready AI applications at scale." Government and industry partners are no longer just experimenting with GenAI - they're building real systems. But when it is time to move from pilot to production, most efforts hit a wall: brittle GenAI data pipelines, modality-specific workarounds, and fragmented architectures that can't adapt as models, file types, modalities or downstream systems evolve. Rather than rebuilding custom data pipelines for every GenAI use case, agencies and integrators can rely on Unstructured's Platform: a modular, enterprise-grade solution purpose-built to extract, transform, enrich, chunk, embed, and deliver AI-ready data - no matter the source or destination. It supports diverse modalities out of the box, works with any model or data store (vector, relational, etc.), and is now accessible in highly secure environments. Unstructured also helps reduce infrastructure and processing costs by intelligently adapting its transformation pipeline to the characteristics of each file - maximizing performance while minimizing costs where possible. Unstructured delivers the production-ready data layer that every GenAI application needs - so teams can focus on building outcomes, not maintaining open-source data pipelines. Unstructured's open source is already widely adopted across the federal government, powering tools like NIPRGPT, CamoGPT, and other systems within the military, national security, federal civilian, and even state and local governments. With the FedRAMP High authorized Platform, government users and industry partners can now operationalize these capabilities at enterprise scale - supported by full end-to-end orchestration across ingestion, transformation, enrichment, and delivery. "Our open-source tools have helped federal teams experiment with LLMs using unstructured data," said Raymond. "Now, with FedRAMP High authorization of our GenAI data orchestration platform, agencies can move beyond experimentation - deploying a secure, production-ready data platform to scale GenAI applications with confidence." About Unstructured Unstructured delivers mission-ready data transformation and orchestration solutions that turn unstructured, multimodal content into AI-ready data at scale. Its modular open platform eliminates the brittleness and high costs of traditional data engineering pipelines, enabling government and commercial organizations to rapidly build and deploy GenAI applications. To learn more or deploy Unstructured, contact [email protected].

readmagazine.com
Aug 11th, 2025
Unstructured.io Joins Palantir FedStart to Advance Federal AI Data Solutions

Unstructured, a leading provider of scalable, mission-ready Generative AI (GenAI) solutions powered by advanced data transformation and orchestration, announced it has joined Palantir Technologies' FedStart program.

INACTIVE