Sr Incident Commander
Posted on 1/4/2024
INACTIVE
Aya Healthcare

10,001+ employees

Comprehensive healthcare staffing and management software provider
Company Overview
Aya Healthcare, the largest healthcare talent software and staffing company in the U.S., offers a comprehensive suite of labor services and software solutions, providing hospital systems with increased efficiency and superior operating results. The company's unique corporate culture and dedicated employees have earned it recognition as a top workplace by several notable publications. With a focus on simplifying processes for healthcare professionals, Aya Healthcare provides exclusive job opportunities, competitive pay rates, and comprehensive support, making it a preferred choice for clinicians nationwide.
Consulting
Data & Analytics

Company Stage

N/A

Total Funding

N/A

Founded

2001

Headquarters

San Diego, California

Growth & Insights
Headcount

6 month growth

8%

1 year growth

26%

2 year growth

125%
Locations
Remote in USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Datadog
Microsoft Azure
Management
Computer Networking
AWS
Linux/Unix
Google Cloud Platform
CategoriesNew
Software Engineering
Requirements
  • Bachelor's Degree in Computer Science, Information Technology, Engineering or related field, or equivalent combination of education, training, and experience
  • 8+ years of experience in high-scale SaaS/PaaS/IaaS environments
  • 5+ years of experience in a team leadership role while acting as a liaison with external/internal customers
  • 8+ years of experience providing IT Operations incident identification, triage, handling, management & resolution support within a Windows/Linux hybrid environment, in either AWS or Azure Clouds
  • Experience supporting engineering applications in a cloud environment at scale
  • Foundational knowledge of Azure, AWS, or Google Cloud infrastructure, services, and support/escalation engagement models
  • Practical experience with IT systems and monitoring solutions such as Datadog, NewRelic, and AppDynamics
  • Good understanding of Incident and Change Management process
Responsibilities
  • Provide management coverage and guidance on all P1, P2, and other high visibility incidents
  • Work closely with development and operation teams, bringing them together to communicate and solve technical issues
  • Provide internal and external executive level updates to all stakeholders
  • Ensure incident response team has an active voice and is driving the troubleshooting
  • Dynamically engage additional resources as needed
  • Create and maintain documentation of incidents, their solutions, and any relevant procedures, making it easier to troubleshoot similar issues in the future
  • Assist with development and delivery of RCA through collaboration with cross functional teams ensuring proper analysis is conducted in order to prevent future occurrences and improve overall system performance
  • Schedule and prioritize tasks for the response team, ensuring that incidents are resolved quickly and efficiently with minimal business impact
  • Continuously improve incident response processes, procedures, and tools to speed up response times and maintain high levels of service quality
  • Participate in an on-call rotation with other Incident Commanders
  • Assist Operation Managers with daily management tasks
  • Refine/improve incident processes to drive team towards SLO goals and to ensure repeatable customer experience
  • Drive incident postmortem reviews to provide ongoing feedback to the Incident Management and Engineering peer teams
  • Mentor and coach Incident Response Team
  • Support and backup other Incident Commanders
  • Develop, monitor, and report on key performance indicators related to incident management, using this data to identify areas for improvement and track progress over time
Desired Qualifications
  • Past Systems, Data, or Network Engineering experience is a plus