Continual Service Improvement Program Manager-Remote
Posted on 3/29/2023
INACTIVE
Splunk

5,001-10,000 employees

Data management & visualization platform
Company Overview
Splunk's mission is to address the challenges and opportunities of managing massive streams of machine-generated big data. Splunk is the leading software platform for machine data that enables customers to gain real-time Operational Intelligence.
AI & Machine Learning
Data & Analytics
Cybersecurity

Company Stage

N/A

Total Funding

$1.4B

Founded

2003

Headquarters

San Francisco, California

Growth & Insights
Headcount

6 month growth

-1%

1 year growth

1%

2 year growth

6%
Locations
Plano, TX, USA • Chicago, IL, USA • St. Louis, MO, USA • Minneapolis, MN...
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Communications
Management
Confluence
Data Analysis
Requirements
  • Have a clear understanding of the ITIL framework
  • You can think outside the box and work on multiple tasks simultaneously while dynamically prioritizing based on changing conditions
  • Ability to work multi-functionally and to influence and execute across geographically dispersed groups
  • You enjoy problem solving and analyzing global-scale distributed systems
  • You have outstanding interpersonal and communication skills
  • You have program /project management experience
  • Experience of working in a highly technical environment with the ability to drive investigations with technical and non-technical teams
  • Strong ability to communicate efficiently and effectively with different teams, from Engineers to Support and Management with the ability to communicate technical issues to non-technical teams
  • Solid understanding of cloud platforms, software deployments, monitoring tools . Prior experience with Cloud or SaaS companies puts you in the front of the line
  • Exudes Customer Success. Passionate about doing what's right for the customer. Willing to take on the tough projects and challenges to support the growth of the business
  • Ability to work independently with a “make it happen” attitude; can operate and execute in areas of uncertainty and ambiguity; problem solver and quick learner
  • Thrives to be seen as a trusted advisor and technical leader who is highly requested by management and peers
Responsibilities
  • Incident / Escalation Avoidance and Lessons-Learned
  • Develop procedures and facilitate lessons-learned actions from Major Incidents and Red / Orange critical accounts. Expand scope over time
  • Build and track issues-register, identify owners, and drive resolution (bring large systemic issues to weekly Production and ELT Reviews, create reports showing top contributing causes, and build scorecards for product areas)
  • Partner with the Cloud Problem Management (CPM) team to drive non-cloud product related improvements. Participate in CPM-run PIRs (Post Incident Review) for major incidents and Red / Orange critical accounts
  • Drive end-to-end / comprehensive continuous improvement across the customer journey
  • Customer Quality Risk Management
  • Devise methodology to proactively identify high impacts bugs (high customer pain score) and trends from Incidents and Escalations
  • Insights to Actions - identify, monitor, assess impact, and drive expedient solutions / fixes / patches / releases for the fleet to minimize customer risk and increase product availability
  • Systemic Incident Remediation
  • Program manage remediation plan and cross-functional efforts for widespread and long-running systemic / P0 incidents. Collaborate with Engineering, Release Management, Tech Ops, Support, Professional Services, etc as needed
  • Own internal and external comms and work with the core team (including Legal as needed for ACP incidents)
  • Track remediation progress - how many customers are impacted / potentially impacted, have they been communicated to, have they been remediated and when, status, etc
  • Reporting, Analytics, and Process Governance
  • Track and own measures of success and provide relevant reports - weekly, monthly, quarterly as appropriate
  • Assist with weekly report generation such as Production Review, Global Account Technical Health Review (GATHR), weekly ELT escalation / incident review, etc. as guided by leadership
  • Ensure processes are designed using a closed-loop feedback mechanism
  • Build a unified team newsletter, manage content, and send periodically to relevant audiences
  • Unified Response Tooling, Automation, and Process Management
  • Be the liaison for the Incident and Escalation Management teams with the SPURvot development team
  • Assist with the tooling roadmap and prioritization to simplify and create an effective and efficient toolset for the organization
  • Ensure consistent, consolidated, and updated documentation (Confluence pages, process docs, training material, onboarding info, etc.)
Desired Qualifications
  • 5+ years of proven experience in a related or similar position - prior experience in incident and escalation management is a plus