Director of Site Reliability Engineering
Confirmed live in the last 24 hours
Wikimedia Foundation

501-1,000 employees

Nonprofit charitable organization
Company Overview
The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free license or in the public domain, and to disseminate it effectively and globally.
Social Impact

Company Stage

Grant

Total Funding

$144.9M

Founded

2003

Headquarters

San Francisco, California

Growth & Insights
Headcount

6 month growth

-1%

1 year growth

3%

2 year growth

19%
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Operating Systems
CategoriesNew
DevOps & Infrastructure
Requirements
  • 8+ years experience in site reliability engineering, technical operations, or infrastructure engineering roles
  • 4+ years experience managing infrastructure teams at high traffic websites or online services at scale
  • Track record of managing, inspiring and mentoring multiple managers and engineers, and aligning them across the organization and in the community
  • Experience in managing large-scale projects with technical deep-dives into code, networking and operating systems
  • Experience developing and tracking department and project budgets
  • Experience in globally distributed, multi-site high-traffic environments, preferably with both on-premise bare-metal and cloud based infrastructure
  • Familiarity with open source development and community practices. Experience adopting/integrating open source solutions. Track record of upstream contributions (whether personal or through a team) is a huge plus
  • Familiarity with engineering team practices and experience interfacing SRE with other design, product and engineering teams tasked with continuous delivery of functionality
  • Familiarity with large website application architectures, including caching layers, storage scaling concepts, network infrastructure, monitoring systems, etc
  • Experience with highly geographically distributed teams and follow-the-sun operations is a major plus. Personal cross-cultural experience (having lived, or worked internationally) helps as well
  • A track record of modeling and shaping best community, open source and development practices
  • Experience in negotiation & RFPs for data center service contracts, equipment purchases, peering agreements, etc
Responsibilities
  • Your first priority: Lead multiple SRE teams in keeping Wikimedia's sites and services (including Wikipedia) running responsively, reliably and securely, including protection against outages, data loss or breaches, and accommodation and implementation of Wikimedia's Movement Strategy (including “Infrastructure for Open”)
  • Your second priority: Partner with engineering teams at Wikimedia to set direction and build platforms enabling transformative changes to Wikimedia's user experience while ensuring appropriate operational review and support along the way
  • Your foundation: An amazing Site Reliability Engineering team that's taken us to more than half a billion users a month with passion, ingenuity, solid engineering practices and duct tape. Nurturing, growing, trusting and developing this team and its leaders is your path to success in this role
  • Your values: You care about free and open information, and are committed to finding solutions to engineering problems in line with our guiding principles. You share our values and work in accordance with them