Full-Time

Senior Site Reliability Engineer

Posted on 1/29/2025

Zeta Global

Zeta Global

1,001-5,000 employees

AI-driven marketing technology for customer engagement

Enterprise Software
AI & Machine Learning

Compensation Overview

$140k - $190kAnnually

Senior

Remote in USA

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
Kubernetes
Python
Grafana
Docker
Microservices
AWS
Go
Prometheus
Jenkins
Terraform

You match the following Zeta Global's candidate preferences

Employers are more likely to interview you if you match these preferences:

Degree
Experience
Requirements
  • Can code confidently in Python or Golang and solve real-world problems through automation. (not only scripting)
  • Have hands-on experience implementing SLIs, SLOs, and distributed tracing in production.
  • Understand Kubernetes, Terraform, and Infrastructure as Code tools.
  • 3+ years of experience as an SRE or in a similar role with hands-on coding.
  • 2+ years of software development experience in Python or Golang, with a focus on building maintainable, production-quality code.
  • Deep understanding of SRE principles, particularly SLIs, SLOs, error budgets, and their real-world application.
  • Hands-on experience conducting postmortems and implementing observability at scale.
  • Expertise in designing and implementing end-to-end observability solutions using tools like OpenTelemetry, Prometheus, Grafana, or Honeycomb.
  • Experience with distributed tracing and handling high-cardinality metrics in production environments.
  • 3+ years of experience with AWS and proficiency in Kubernetes, Terraform, and Infrastructure as Code (IaC) tools.
  • Strong understanding of distributed systems, microservices architectures, and containerization (Docker, Kubernetes).
  • Hands-on experience with CI/CD platforms (GitOps, Jenkins, ArgoCD) and building automated pipelines.
  • Familiarity with tools and frameworks for incident management and operational automation.
Responsibilities
  • Design, implement, and manage SLOs, SLIs, and error budgets, ensuring reliability aligns with user expectations and business objectives.
  • Develop production-grade software to enhance system reliability and reduce manual toil through automation.
  • Implement and optimize observability solutions using tools like OpenTelemetry, with a focus on high-cardinality metrics, distributed tracing, and actionable insights.
  • Drive postmortem processes and lead in-depth root cause analyses for incidents, ensuring lessons learned are effectively applied to prevent recurrence.
  • Define and monitor MTTx metrics (MTTA, MTTR, MTTF), using them to guide system improvements and measure reliability progress.
  • Collaborate with engineering teams to design systems with reliability and scalability in mind, incorporating capacity planning, resiliency patterns, and modern deployment strategies (e.g., Canary, Blue-Green).
  • Lead design reviews for alerting strategies, ensuring effective signal-to-noise ratios in monitoring and incident management.
  • Advocate for and implement best practices in incident response and system design to achieve optimal uptime and performance.
Desired Qualifications
  • Knowledge of modern deployment strategies (e.g., Canary, Blue-Green) and resiliency patterns (e.g., circuit breakers, retries).
  • Experience with Kafka or similar distributed messaging systems.
  • Strong analytical skills for statistical analysis of metrics to identify and resolve performance bottlenecks.

Zeta Global focuses on enhancing marketing strategies for brands by using data and artificial intelligence. Its main product, the Zeta Marketing Platform, provides businesses with a comprehensive view of their customers and potential customers in real-time. This platform uses AI to tailor marketing experiences to individual consumers across various channels, making interactions more relevant and effective. What sets Zeta Global apart from its competitors is its ability to aggregate extensive real-time data, including behavioral signals and purchase intents, which helps in understanding customer intent and predicting churn. The company's goal is to help brands not only acquire new customers but also retain them longer and increase their overall value through personalized marketing efforts.

Company Stage

IPO

Total Funding

$301.5M

Headquarters

New York City, New York

Founded

2007

Growth & Insights
Headcount

6 month growth

0%

1 year growth

0%

2 year growth

0%
Simplify Jobs

Simplify's Take

What believers are saying

  • Zeta's acquisition of ArcaMax enhances its data cloud and platform capabilities.
  • The launch of Zeta NEXT showcases new AI technologies for marketing success.
  • Achieving carbon neutrality in 2022 positions Zeta as a leader in sustainability.

What critics are saying

  • Integration challenges from acquiring LiveIntent and ArcaMax may affect operations.
  • Increased competition from emerging AI-driven platforms could erode market share.
  • Reliance on third-party data providers may be impacted by data privacy regulations.

What makes Zeta Global unique

  • Zeta Global leverages AI to create personalized marketing experiences at scale.
  • The company offers a real-time view of prospects using advanced analytics and big data.
  • Zeta's platform integrates AI-driven predictive analytics for optimized marketing strategies.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Unlimited Paid Time Off

Health Insurance

Dental Insurance

Vision Insurance

Employee Equity and Stock Purchase Plan

Employee Discounts

Wellness Program

Pet Insurance