Senior Site Reliability Engineer
Posted on 2/22/2023
INACTIVE
Locations
Remote
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Node.js
Java
Linux/Unix
PHP
Scala
Python
Datadog
Requirements
  • 5+ years' experience in a Site Reliability Engineering (SRE) role / Software Engineering Role
  • Experience mentoring and leading a team in a high-growth organization
  • Experience with SRE topics like SLOs, Error Budgets, resiliency, auto-scaling, self-healing, performance, and more
  • Experience in one or more of the following: node.js, Java, Linux, Python, Go, PHP, or Scala
  • Understanding of Monitoring & Alerting tools (Datadog, Pagerduty, Alert Manager, etc)
  • Learn more about the next chapter for us, our customers and the future of customer experience here
  • To find out more about our people and Life At AIQ, be sure to visit our Medium Tech and Life blogs
Responsibilities
  • Modified existing systems to detect and report symptoms in addition to disruptions. We need to be aware of potential problems
  • Designed and maintain monitoring, log centralization, and alerting for all services to facilitate observability and incident management
  • Used log analysis troubleshoot performance problems and system outages
  • Partnered with the product development teams to design and enhance software architecture to improve scalability, service reliability, cost, and performance
  • Worked with engineers to re-architect and rebuild core services their teams rely on to be more efficient and cost effective
  • Standardize our approach to observability so it is easy for a developer to do the right thing
  • Proactively identify high-value initiatives that improve system reliability
  • Define and measure SLOs, SLIs, , Error Budget and actionable alerts such as auto scaling, self healing, etc
  • Identify and reengineer/fix faulty code/configurations (technical debt) within applications and services
  • Monitor system health, latency, and availability to maintain services after they are in production
  • Provide assistance with blameless post-mortems and troubleshoot priority incidents
  • You will work with your team to monitor and ensure the health of the platform, which includes a 24/7 on-call rotation, to ensure a great customer experience
ActionIQ

201-500 employees

Data-driven customer experience software
Company Overview
ActionIQ's mission is to help brands make every team member a CX champion. The company's CX Hub empowers everyone to be a CX champion by giving business teams the freedom to explore and take action on customer data, while helping technical teams regain control of where data lives and how it is used.
Benefits
  • Medical, dental & vision
  • 401k
  • FSA
  • Commuter benefits
  • Gym reimbursement
  • Flexible PTO
  • Paid parental leave
  • Office perks
Company Core Values
  • Human
  • Helpful
  • Innovative