Senior Site Reliability Engineer
Posted on 11/7/2023
INACTIVE
Wikimedia Foundation

51-200 employees

Nonprofit charitable organization
Company Overview
The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally.
Social Impact

Company Stage

Grant

Total Funding

$144.9M

Founded

2003

Headquarters

San Francisco, California

Growth & Insights
Headcount

6 month growth

-89%

1 year growth

-88%

2 year growth

-87%
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Development Operations (DevOps)
Docker
Management
Operating Systems
Puppet
Ruby
Terraform
Kubernetes
Python
Ansible
CategoriesNew
DevOps & Infrastructure
Software Engineering
Requirements
  • 5+ years experience in an SRE/Operations/DevOps role
  • Experience with operating highly available infrastructure
  • Experience with running applications and services at scale
  • Proficient with shell and a programming language used in an SRE/Operations engineering context (Python, Go, Ruby, etc.)
  • Comfortable with Open Source configuration management and orchestration tools (Puppet, Ansible, TerraForm etc.)
  • Communicative technical English
  • Experience implementing containerization solutions (Docker, Kubernetes)
  • Experience with package management for operating systems (Debian, etc)
  • We are avid supporters (and users) of open source software; history of contributing to Open Source projects is valued
  • Familiarity with RFC 2549
  • Prior participation in the Wikimedia movement
Responsibilities
  • Design, implementation and maintenance of public facing infrastructure and services
  • Use of configuration management and deployment tools
  • Architectural design and operation at scale
  • Monitoring of systems and services, optimization of performance and resource utilization
  • Common operating system level tasks such as logging and backup / restore
  • Cookbook / runbook implementation for common maintenance actions
  • Incident response, diagnosis and follow-up on system outages or alerts
  • Automation and streamlining of tasks as well as identifying process gaps
  • Collaborating with a global and asynchronously communicating team (don't worry if you have never worked remotely, we'll help you get used to it)
  • Mentoring peers in your areas of technical and operational strength