Full-Time

Lead Site Reliability Engineer

Site Reliability Engineer

Posted on 9/4/2025

Mattermost

Mattermost

51-200 employees

Secure collaboration platform with customizable workflows

Compensation Overview

$170k - $200k/yr

Remote in USA

Remote

Remote-first; applicants must be based in the United States.

Category
DevOps & Infrastructure (1)
Required Skills
Kubernetes
Infrastructure as Code (IaC)
AWS
Terraform
Observability
DevOps
Requirements
  • BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles.
  • Proven expertise in container orchestration platforms, ideally Kubernetes.
  • Extensive experience with infrastructure-as-code, ideally Terraform.
  • Strong background in cloud platforms, ideally AWS.
  • Demonstrated experience designing and implementing monitoring, alerting, and performance optimization strategies.
  • Exceptional troubleshooting and incident management skills for distributed systems.
  • Proficiency in at least one scripting or programming language for automation.
  • Excellent communication skills with a track record of influencing cross-functional teams.
  • Experience leading globally distributed teams in a remote-first environment.
  • For candidates residing in the U.S.: This role may require the ability to obtain and maintain a U.S. government security clearance in the future. As such, U.S. applicants must be U.S. citizens and eligible under applicable clearance requirements.
  • Applicants must meet eligibility requirements for access to export-controlled information as defined by U.S. export control laws, including EAR and ITAR.
Responsibilities
  • Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals.
  • Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD).
  • Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale.
  • Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements.
  • Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements.
  • Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations.
  • Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets.
  • Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production.
  • Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence.
Desired Qualifications
  • Familiarity with observability stacks such as Grafana and Prometheus.
  • Experience designing high-availability, disaster recovery, and scaling architectures.
  • Exposure to GCP and Azure cloud environments.
  • Leadership experience in highly regulated industries such as defense, finance, or critical infrastructure.
  • Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards.
  • Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support.
  • Open-source contributions in reliability, DevOps, or infrastructure tooling.
  • Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect).

Mattermost provides a secure, customizable collaboration platform designed for technical teams. It offers real-time messaging, file and code snippet sharing with inline syntax highlighting, and workflow automation, all within a platform that can be fully customized and deployed anywhere to meet strict security and data-control needs. The product integrates with essential developer and IT tools like GitHub, GitLab, and ServiceNow, enabling users to run and automate workflows from a single interface. In addition to an open-source version, Mattermost offers premium features such as built-in identity and access controls, granular admin controls, advanced compliance auditing and reporting, and flexible deployment options. Its goal is to help technical teams collaborate more effectively while maintaining strong security and data governance.

Company Size

51-200

Company Stage

Series B

Total Funding

$70.1M

Headquarters

Palo Alto, California

Founded

2016

Simplify Jobs

Simplify's Take

What believers are saying

  • Expands into cyber defense and DevSecOps with secure out-of-band SOC/CERT workflows.
  • 800,000 workspaces and 800+ customers including NASA, Nasdaq, Samsung, SAP, USAF.
  • Monthly MIT-licensed releases with Go/React single Linux binary and voice/screen sharing.

What critics are saying

  • Slack's enterprise dominance and AI features lock in customers, eroding market share.
  • Microsoft Teams free tier with government compliance undercuts premium public sector pricing.
  • DoD CMMC 3.0 certification delays exclude Mattermost from $10B+ defense contracts.

What makes Mattermost unique

  • Open-source platform with sovereign cloud deployments across Azure, Oracle, Google, AWS.
  • AI-powered Intelligent Mission Environment for classified networks and tactical edge operations.
  • In-region presence in Australia, Canada, Japan, and US Federal with cleared personnel.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Fully remote work

Office setup fund

Coworking space stipend

Internet and mobile phone reimbursement

401k

Unlimited vacation

Family & friends days

Async weeks

Health benefits

Global and regional team meetups

Open source Fridays

Community hackathons and events

Growth & Insights

Headcount

6 month growth

0%

1 year growth

0%

2 year growth

1%
INACTIVE