Full-Time

ML/AI Research Engineer

Agentic AI Lab, Founding Team

Fabrion

Fabrion

AI-native platform for industrial manufacturing

No salary listed

California, USA

In Person

Category
AI & Machine Learning (1)
Required Skills
LLM
React.js
Machine Learning
LangGraph
Observability
LangChain
Responsibilities
  • Fine-tune and evaluate open-source LLMs (e.g. LLaMA 3, Mistral, Falcon, Mixtral) for enterprise use cases with both structured and unstructured data
  • Build and optimize RAG pipelines using LangChain, LangGraph, LlamaIndex, or Dust — integrated with our vector DBs and internal knowledge graph
  • Train agent architectures (ReAct, AutoGPT, BabyAGI, OpenAgents) using enterprise task data
  • Develop embedding-based memory and retrieval chains with token-efficient chunking strategies
  • Create reinforcement learning pipelines to optimize agent behaviors (e.g. RLHF, DPO, PPO)
  • Establish scalable evaluation harnesses for LLM and agent performance, including synthetic evals, trace capture, and explainability tools
  • Contribute to model observability, drift detection, error classification, and alignment
  • Optimize inference latency and GPU resource utilization across cloud and on-prem environments
Desired Qualifications
  • Deep experience fine-tuning open-source LLMs using HuggingFace Transformers, DeepSpeed, vLLM, FSDP, LoRA/QLoRA
  • Worked with both base and instruction-tuned models; familiar with SFT, RLHF, DPO pipelines
  • Comfortable building and maintaining custom training datasets, filters, and eval splits
  • Understand tradeoffs in batch size, token window, optimizer, precision (FP16, bfloat16), and quantization
  • Experience building enterprise-grade RAG pipelines integrated with real-time or contextual data
  • Familiar with LangChain, LangGraph, LlamaIndex, and open-source vector DBs (Weaviate, Qdrant, FAISS)
  • Experience grounding models with structured data (SQL, graph, metadata) + unstructured sources
  • Bonus: Worked with Neo4j, Puppygraph, RDF, OWL, or other semantic modeling systems
  • Experience training or customizing agent frameworks with multi-step reasoning and memory
  • Understand common agent loop patterns (e.g. Plan→Act→Reflect), memory recall, and tools
  • Familiar with self-correction, multi-agent communication, and agent ops logging
  • Strong background in token cost optimization, chunking strategies, reranking (e.g. Cohere, Jina), compression, and retrieval latency tuning
  • Experience running models under quantized (int4/int8) or multi-GPU settings with inference tuning (vLLM, TGI)
  • LLM Training & Inference: HuggingFace Transformers, DeepSpeed, vLLM, FlashAttention, FSDP, LoRA
  • Agent Orchestration: LangChain, LangGraph, ReAct, OpenAgents, LlamaIndex
  • Vector DBs: Weaviate, Qdrant, FAISS, Pinecone, Chroma
  • Graph Knowledge Systems: Neo4j, Puppygraph, RDF, Gremlin, JSON-LD
  • Storage & Access: Iceberg, DuckDB, Postgres, Parquet, Delta Lake
  • Evaluation: OpenLLM Evals, Trulens, Ragas, LangSmith, Weight & Biases
  • Compute: Ray, Kubernetes, TGI, Sagemaker, LambdaLabs, Modal
  • Languages: Python (core), optionally Rust (for inference layers) or JS (for UX experimentation)
  • Startup DNA: resourceful, fast-moving, and capable of working in ambiguity
  • Deep curiosity about agent-based architectures and real-world enterprise complexity
  • Comfortable owning model performance end-to-end: from dataset to deployment
  • Strong instincts around explainability, safety, and continuous improvement
  • Enjoy pair-designing with product and UX to shape capabilities, not just APIs

Fabrion builds an AI-native platform for industrial manufacturing to accelerate AI adoption across complex, multi-tier value chains. The product applies artificial intelligence and machine learning to optimize operations in manufacturing, supply chains, and value-chain processes, with the aim of improving speed, resilience, and overall productivity. Unlike generic AI tools, Fabrion targets industrial contexts and real-world industrial workflows to address efficiency and robustness in manufacturing environments. The company differentiates itself through a focused mission on the industrial sector, leveraging a dedicated platform designed for complex, multi-tier value chains and partnering with capital backers like 8VC to fund its early-stage development. Fabrion’s goal is to help manufacturers operate more efficiently and robustly by enabling AI-driven decision making across their value chains.

Company Size

N/A

Company Stage

N/A

Total Funding

N/A

Headquarters

N/A

Founded

N/A

Simplify Jobs

Simplify's Take

What believers are saying

  • Telit Cinterion's deviceWISE Suite validates market demand for OT-AI integration at scale.
  • Foundational Industries' autonomous factory buildout creates partnership opportunities for Fabrion's AI stack.
  • Multi-tier value chain optimization addresses $2T+ manufacturing inefficiency across supply chains globally.

What critics are saying

  • Telit Cinterion directly competes with unified data backbone for industrial AI integration.
  • Foundational Industries' hardware-software integration outpaces Fabrion's software-only platform capabilities.
  • 8VC's exclusive funding creates existential risk if reallocated to competing autonomous operations startups.

What makes Fabrion unique

  • Custom fine-tuned SLMs with RLHF governance differentiate from foundation model wrappers.
  • Bare-metal acceleration for ETL/training enables federated deployments across hyperscalers and edge.
  • Industry-specific knowledge graph connects fragmented data, teams, and suppliers across production lifecycle.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

401(k) Retirement Plan

Remote Work Options