Facebook pixel

Staff Site Reliability Engineer
Traffic, /Devops
Posted on 2/11/2022
INACTIVE
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Bash
Data Structures & Algorithms
Development Operations (DevOps)
C/C++/C#
Linux/Unix
Management
nginx
Puppet
Ruby
Rust
Kubernetes
Python
Go
Ansible
Requirements
  • 8+ years experience in an SRE/Operations/DevOps role as part of a team
  • Experience with shell and any scripting languages used in an SRE context (Python, Go, Bash, Ruby, etc., we use primarily Python), and configuration management tools (Puppet, Ansible, etc.)
  • Experience with C, C++, Golang or Rust
  • Experience with distributed caching systems: including their underlying algorithms and how to optimize their performance
  • A thorough, protocol-level understanding of TCP/IP, HTTP, and TLS
  • Experience with package management on Linux systems (we use Debian)
  • Comfortable with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack)
  • Good Linux system level skills
  • History of automating tasks and processes, identifying process gaps, and finding automation opportunities
  • Strong English language skills and ability to work independently, as an effective part of a globally distributed team
  • Experience with the use, maintenance and configuration of monitoring, metrics and logging infrastructure (Prometheus, Grafana, ELK, Icinga/Nagios, etc.)
  • Experience with high-performance HTTP(S) caching proxy software, such as Varnish, Envoy Proxy, Apache Traffic Server, Nginx or HAProxy
  • Experience with Linux kernel tuning for high traffic loads
  • Developing/contributing to Free and Open Source software, or being part of an open-source community
  • Experience with defining cross-team SLOs and their implementation
Responsibilities
  • Performing day-to-day operational/DevOps tasks on Wikimedia's public facing infrastructure (deployment, maintenance, configuration, troubleshooting)
  • Implementing and utilizing configuration management and deployment tools (Puppet, Kubernetes)
  • Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform
  • Assisting in the architectural design of new services and making them operate at scale
  • Assisting in or leading incident response, diagnosis, and follow-up on system outages and alerts across Wikimedia's production infrastructure
  • Share our values and work in accordance with them
Wikimedia Foundation

501-1,000 employees

Nonprofit charitable organization
Company Overview
The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally.