Full-Time

Senior Backend Engineer

Distributed System

Zettabyte

Zettabyte

1-10 employees

High-performance GPU cloud IaaS provider

No salary listed

No H1B Sponsorship

Palo Alto, CA, USA

Hybrid

Hybrid: 3 days in-office per week; Palo Alto, CA residency required.

Category
Software Engineering (1)
Required Skills
gRPC
Kubernetes
Python
GraphQL
Docker
Go
REST APIs
Linux/Unix
Requirements
  • 5+ years backend engineering experience with distributed systems
  • Strong proficiency in Go, Python, or similar backend languages
  • Experience with resource scheduling, orchestration, and API design (REST, GraphQL, gRPC)
  • Understanding of hardware constraints and system optimization
  • Linux systems knowledge and containerization experience (Docker, Kubernetes)
  • Comfortable working with expensive resources where efficiency directly impacts costs
  • Excited about solving novel problems in AI infrastructure (not just another CRUD app)
  • Startup mindset—comfortable with ambiguity and rapid iteration
  • Must locate in Palo Alto
  • Applicants must be authorized to work in the United States without need for visa sponsorship
Responsibilities
  • Design APIs that abstract complex GPU operations into simple developer experiences
  • Build scheduling algorithms that maximize GPU utilization while ensuring SLA compliance
  • Develop resource management systems for GPU lifecycle—provisioning, allocation, scheduling, and release
  • Create usage tracking and billing systems for GPU-hours, memory usage, and compute utilization
  • Implement monitoring for GPU-specific metrics, health checks, and automatic failure recovery
  • Build multi-tenancy systems with resource isolation, quota management, and fair scheduling
  • Optimize cold starts for model serving and implement efficient model loading strategies
  • Collaborate with frontend engineers to expose complex infrastructure through intuitive interfaces
  • Leverage AI-assisted coding tools to boost productivity and code quality
Desired Qualifications
  • GPU or HPC cluster management experience
  • Understanding of ML/AI workload patterns and requirements
  • Experience with high-value resource allocation systems
  • Background in performance optimization for compute-intensive workloads
  • Familiarity with GPU virtualization and sharing technologies
  • Experience building billing or metering systems

Zettabyte provides GPU cloud infrastructure as a service by building and operating AI data centers with on-demand NVIDIA GPUs (H100/A100). Its Zsuite software, including Zware, orchestrates and optimizes AI workloads across multiple GPUs, servers, and data centers, automating resource management and scheduling for AI training and inference. It differentiates itself with turnkey enterprise solutions, energy-efficient liquid-cooled data centers (a first in Taiwan), and partnerships with manufacturers and data-center capacity providers. Its goal is to help customers accelerate AI initiatives worldwide by delivering easy access to powerful compute and efficient workload management.

Company Size

1-10

Company Stage

N/A

Total Funding

N/A

Headquarters

Taipei, Taiwan

Founded

2024

Simplify Jobs

Simplify's Take

What believers are saying

  • Lam Capital, Foxconn, Wistron investments fuel global expansion.
  • Headline Asia funds Japan market entry in 2024.
  • Chief Telecom retrofit proves 45-day AI infra modernization.

What critics are saying

  • NVIDIA DGX Cloud commoditizes Zsuite in 12-24 months.
  • Foxconn, Wistron vertically integrate, cut Zettabyte margins in 18-36 months.
  • Taiwan tensions block NVIDIA GPU exports in 12-36 months.

What makes Zettabyte unique

  • Zsuite's Zware orchestrates AI workloads across GPUs and data centers.
  • TITAN™ designs end-to-end AI data halls with liquid cooling.
  • Sovereign AI Data Centers enable full ownership via BOTT model.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Zettabyte who can refer or advise you

Benefits

Company Equity

Company News

PR Newswire
Feb 9th, 2026
Zettabyte and LiteOn collaborate on distributed edge AI inferencing at cell towers

Zettabyte and LiteOn have announced a research and development collaboration to evaluate a distributed edge AI inferencing platform called the Ultra Edge Pod, deployed at cell tower locations. The project aims to bring AI inference workloads closer to users through Mobile Edge Compute platforms, reducing latency and enabling AI readiness in countries without mature data centre infrastructure. LiteOn will provide power, cooling and physical infrastructure, whilst Zettabyte will deliver software for GPU scheduling, orchestration and remote operations. The deployment will demonstrate how integrated infrastructure and software can enable reliable, low-cost AI inference in distributed environments with real-world power and thermal constraints. The platform is designed to support low-latency, location-aware AI inference workloads operating closer to mobile users and radio access networks.

PR Newswire
Jan 28th, 2026
Zettabyte Announces Strategic Investment From Headline Asia to Support Japan Expansion

/PRNewswire/ -- Zettabyte, an AI computing company, today announced a strategic investment from Headline Asia. The investment will support Zettabyte's...

PR Newswire
Aug 1st, 2025
Global AI Data Center Infrastructure Leader Zettabyte Receives Strategic Investment from Lam Capital

Global AI data center infrastructure leader Zettabyte receives strategic investment from Lam Capital.

PYMNTS
Dec 26th, 2024
Foxconn Invests in AI Data Center Firm Zettabyte to Boost Sustainable Computing

Zettabyte has recently announced additional partnerships with Pegatron and Chief Telecom.

Sivastatz
Dec 24th, 2024
Foxconn Announces Strategic Partnership With Zettabyte to Transform AI Data Centers

Foxconn announces strategic partnership with Zettabyte to transform AI data centers.