Full-Time

Senior MLOps Engineer

Posted on 10/7/2025

Trase Systems

Trase Systems

11-50 employees

Platform for deploying autonomous enterprise AI

Compensation Overview

$200k - $250k/yr

Seattle, WA, USA + 1 more

More locations: McLean, VA, USA

Remote

Category
AI & Machine Learning (1)
Required Skills
LLM
Kubernetes
Microsoft Azure
Python
Docker
AWS
Jenkins
Terraform
Google Cloud Platform
Requirements
  • 10+ years in software/infrastructure engineering, with 5+ years in a senior/lead MLOps, ML Infrastructure, or Platform role.
  • Expertise in designing and operating scalable, production-grade ML systems on AWS, GCP, or Azure.
  • Mastery of Docker and Kubernetes for managing production ML workloads.
  • Proven experience managing complex infrastructure as code (IaC) with tools like Terraform.
  • Deep experience architecting CI/CD/CT pipelines for complex ML workflows (e.g., GitHub Actions, Jenkins).
  • Strong Python programming skills for infrastructure automation, tooling, and services.
  • Experience architecting solutions across the full ML lifecycle, from experiment tracking to advanced deployment patterns and monitoring.
  • Exceptional communication skills to articulate complex architectural strategy to stakeholders at all levels.
  • Familiarity with modern MLOps tools like MLflow, Kubeflow, SageMaker, or Vertex AI.
  • Experience with the operational challenges of LLMs, including fine-tuning pipelines, RAG systems, and vector databases.
Responsibilities
  • Own the technical vision, strategy, and end-to-end architecture for Trase’s MLOps platform, ensuring scalability, reliability, security, and cost-efficiency.
  • Architect and build a sophisticated CI/CD/CT ecosystem to automate the entire ML lifecycle, from data validation to production monitoring.
  • Lead the design of scalable and resilient ML infrastructure using IaC (Terraform) and container orchestration (Kubernetes) on a major cloud platform.
  • Establish MLOps best practices, including frameworks for version control, experiment tracking, model governance, and responsible AI.
  • Implement a robust monitoring and alerting framework to track model performance, detect drift, and ensure the reliability of production ML services.
  • Serve as the organization's thought leader on MLOps, mentoring engineers, and driving cross-functional alignment on platform strategy and best practices.
  • Define the multi-year roadmap for Trase’s MLOps ecosystem in alignment with business and product strategy.
  • Anticipate emerging trends (LLMOps, autoML, multi-cloud, federated learning) and guide the org to adopt them proactively.
  • Define patterns for operating large-scale LLMs and multi-modal AI in production with efficiency and compliance.
  • Solve highly ambiguous, large-scale ML deployment challenges where no precedent exists, defining best practices for the org.

Trase Systems provides an enterprise AI platform that enables large organizations to deploy, manage, and optimize autonomous AI agents. It offers an end-to-end, model-agnostic solution with an agent builder, multi-agent orchestration, and full observability, designed to automate complex administrative workflows in regulated sectors such as healthcare, national security, and energy. The platform can run across cloud, on-premises, or air-gapped environments to support sensitive workloads, with SOC 2 and HIPAA compliance. Trase differentiates itself through a shared-savings business model (no upfront costs) and a focus on measurable efficiency gains from AI agents, lowering barriers to ROI while ensuring enterprise-grade governance and security. Its goal is to simplify AI adoption for large organizations by addressing the “last mile” of AI implementation and delivering practical, auditable automation at scale.

Company Size

11-50

Company Stage

N/A

Total Funding

N/A

Headquarters

N/A

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • US Navy contract validates national security AI applications.
  • Duke Health partnership proves healthcare workflow automation value.
  • Red Cell Partners incubation leverages AI agent frontier advancements.

What critics are saying

  • DoD CDAO bans non-FedRAMP AI, disqualifying air-gapped deployments now.
  • OpenAI Swarm commoditizes orchestration, enabling internal builds immediately.
  • UiPath Pathmind acquisition undercuts healthcare savings with RPA scale.

What makes Trase Systems unique

  • Model-agnostic platform deploys AI agents in air-gapped systems.
  • Shared savings model charges only on efficiency gains achieved.
  • Automates healthcare workflows like prescription refills for Duke Health.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Paid Sick Leave

Parental Leave

Unlimited Paid Time Off

Professional Development Budget

401(k) Retirement Plan

401(k) Company Match

Mental Health Support

Performance Bonus

Flexible Work Hours

Growth & Insights

Headcount

6 month growth

6%

1 year growth

4%

2 year growth

-4%
INACTIVE