Continual Service Improvement Program Manager-Remote
Posted on 3/29/2023
INACTIVE
Data management & visualization platform
Company Overview
Splunk's mission is to address the challenges and opportunities of managing massive streams of machine-generated big data. Splunk is the leading software platform for machine data that enables customers to gain real-time Operational Intelligence.
AI & Machine Learning
Data & Analytics
Cybersecurity
Company Stage
N/A
Total Funding
$1.4B
Founded
2003
Headquarters
San Francisco, California
Growth & Insights
Headcount
6 month growth
↓ -1%1 year growth
↑ 1%2 year growth
↑ 6%Locations
Plano, TX, USA • Chicago, IL, USA • St. Louis, MO, USA • Minneapolis, MN...
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Communications
Management
Confluence
Data Analysis
Requirements
- Have a clear understanding of the ITIL framework
- You can think outside the box and work on multiple tasks simultaneously while dynamically prioritizing based on changing conditions
- Ability to work multi-functionally and to influence and execute across geographically dispersed groups
- You enjoy problem solving and analyzing global-scale distributed systems
- You have outstanding interpersonal and communication skills
- You have program /project management experience
- Experience of working in a highly technical environment with the ability to drive investigations with technical and non-technical teams
- Strong ability to communicate efficiently and effectively with different teams, from Engineers to Support and Management with the ability to communicate technical issues to non-technical teams
- Solid understanding of cloud platforms, software deployments, monitoring tools . Prior experience with Cloud or SaaS companies puts you in the front of the line
- Exudes Customer Success. Passionate about doing what's right for the customer. Willing to take on the tough projects and challenges to support the growth of the business
- Ability to work independently with a “make it happen” attitude; can operate and execute in areas of uncertainty and ambiguity; problem solver and quick learner
- Thrives to be seen as a trusted advisor and technical leader who is highly requested by management and peers
Responsibilities
- Incident / Escalation Avoidance and Lessons-Learned
- Develop procedures and facilitate lessons-learned actions from Major Incidents and Red / Orange critical accounts. Expand scope over time
- Build and track issues-register, identify owners, and drive resolution (bring large systemic issues to weekly Production and ELT Reviews, create reports showing top contributing causes, and build scorecards for product areas)
- Partner with the Cloud Problem Management (CPM) team to drive non-cloud product related improvements. Participate in CPM-run PIRs (Post Incident Review) for major incidents and Red / Orange critical accounts
- Drive end-to-end / comprehensive continuous improvement across the customer journey
- Customer Quality Risk Management
- Devise methodology to proactively identify high impacts bugs (high customer pain score) and trends from Incidents and Escalations
- Insights to Actions - identify, monitor, assess impact, and drive expedient solutions / fixes / patches / releases for the fleet to minimize customer risk and increase product availability
- Systemic Incident Remediation
- Program manage remediation plan and cross-functional efforts for widespread and long-running systemic / P0 incidents. Collaborate with Engineering, Release Management, Tech Ops, Support, Professional Services, etc as needed
- Own internal and external comms and work with the core team (including Legal as needed for ACP incidents)
- Track remediation progress - how many customers are impacted / potentially impacted, have they been communicated to, have they been remediated and when, status, etc
- Reporting, Analytics, and Process Governance
- Track and own measures of success and provide relevant reports - weekly, monthly, quarterly as appropriate
- Assist with weekly report generation such as Production Review, Global Account Technical Health Review (GATHR), weekly ELT escalation / incident review, etc. as guided by leadership
- Ensure processes are designed using a closed-loop feedback mechanism
- Build a unified team newsletter, manage content, and send periodically to relevant audiences
- Unified Response Tooling, Automation, and Process Management
- Be the liaison for the Incident and Escalation Management teams with the SPURvot development team
- Assist with the tooling roadmap and prioritization to simplify and create an effective and efficient toolset for the organization
- Ensure consistent, consolidated, and updated documentation (Confluence pages, process docs, training material, onboarding info, etc.)
Desired Qualifications
- 5+ years of proven experience in a related or similar position - prior experience in incident and escalation management is a plus