Full-Time

Senior Software Engineer

Compute Capacity

Posted on 5/8/2026

Anthropic

Anthropic

5,001-10,000 employees

Develops reliable, interpretable AI systems

Compensation Overview

$320k - $405k/yr

H1B Sponsorship Available

San Francisco, CA, USA + 1 more

More locations: New York, NY, USA

Hybrid

Office-based hybrid policy: employees must be in one of the offices at least 25% of the time.

Category
Software Engineering (1)
Required Skills
Kubernetes
Microsoft Azure
Python
Grafana
BigQuery
SQL
AWS
Prometheus
Google Cloud Platform
Requirements
  • 5+ years of software engineering experience with a strong track record building and operating production systems.
  • Kubernetes fluency at operational depth; operated production Kubernetes at meaningful scale, including scheduling, taints, labels, node management, and debugging cluster-level issues.
  • Data pipeline engineering experience; designing, building, and owning the full lifecycle of production data pipelines; experience with data warehouses (BigQuery preferred), schema management, streaming ingestion, and SLOs for latency and completeness.
  • Observability tooling experience; Prometheus, PromQL, and Grafana; experience writing recording rules, understanding metric semantics, and building monitoring systems that engineering teams rely on.
  • Python and SQL at production quality; most pipeline code in Python; BigQuery SQL including table-valued functions and views; idiomatic, well-tested, maintainable code.
  • Familiarity with at least one major cloud provider (AWS, GCP, or Azure) at the infrastructure level; compute, billing, usage APIs, cost management tooling; multi-cloud experience is a strong plus.
  • High autonomy and strong cross-team communication; ability to gather requirements, navigate ambiguity, and work across organizational boundaries; scrappiness and ownership matter more than polish.
Responsibilities
  • Build and operate data pipelines that ingest accelerator occupancy, utilization, and cost data from multiple cloud providers into BigQuery; own data completeness, latency SLOs, gap detection, and backfill automation.
  • Develop and maintain observability infrastructure—Prometheus recording rules, Grafana dashboards, and alerting systems—that surface actionable signals about fleet health, occupancy, and efficiency.
  • Instrument and analyze compute efficiency metrics across training, inference, and eval workloads; build benchmarking infrastructure, establish per-config baselines, and collaborate with system-owning teams to improve utilization.
  • Build internal tooling and platforms that enable capacity planning, workload attribution, and cluster debugging; consumers are internal teams including research engineering, infrastructure, finance, and leadership.
  • Operate Kubernetes-native systems at scale—deploy data collection agents, manage workload labeling infrastructure, and understand how taints, reservations, and scheduling affect capacity.
  • Normalize and reconcile data across heterogeneous sources—including AWS, GCP, and Azure billing exports, vendor-specific telemetry formats, and internal systems with different schemas and billing arrangements.
  • Collaborate across organizational boundaries with research engineering, infrastructure, inference, and finance teams; gather requirements from technical stakeholders, translate them into useful systems, and communicate trade-offs to non-technical audiences.
Desired Qualifications
  • Multi-cloud data ingestion experience—especially working with AWS and GCP APIs, billing exports, or vendor-specific telemetry formats.
  • Accelerator infrastructure familiarity—GPU metrics (DCGM), TPU utilization, Trainium power and utilization metrics, or experience working with ML training/inference systems at the hardware level.
  • Performance engineering and benchmarking experience—building benchmark harnesses, establishing baselines, reasoning about compute efficiency, and working with system teams to diagnose and improve performance.
  • Data-as-product thinking—experience building internal data products with self-service access, schema contracts, API serving, documentation, and discoverability.
  • Experience with capacity planning, resource management, or cost attribution systems at a hyperscaler or large-scale ML environment (FinOps, chargeback systems, cost modeling).
  • Familiarity with ClickHouse, Terraform, or Rust.

Anthropic focuses on AI research to build reliable, interpretable, and steerable AI systems. Its main product, Claude, is an AI assistant designed to handle tasks at any scale for clients across industries, delivered through deployment and licensing along with specialized AI R&D services. Claude works by combining natural language processing, human feedback, reinforcement learning, and interpretability techniques to produce a capable, controllable AI assistant that can assist with a wide range of tasks. The company differentiates itself from competitors by prioritizing safety, transparency, and controllability—emphasizing reliability, interpretability of model behavior, and user-controlled steerability in its AI systems. Anthropic’s goal is to make AI systems that people can trust and efficiently use to improve operations and decision-making across sectors.

Company Size

5,001-10,000

Company Stage

Late Stage VC

Total Funding

$77.3B

Headquarters

San Francisco, California

Founded

2021

Simplify Jobs

Simplify's Take

What believers are saying

  • Japan's megabanks access Claude Mythos by May 2026 end for operations.
  • Launched 12 legal plugins May 12, 2026, attracting 20,000 professionals.
  • Thomson Reuters integrates Claude with CoCounsel for 1 million users summer 2026.

What critics are saying

  • Japan FSA working group delays Mythos banking deployments within 3-6 months.
  • Voided Forge and Hiive trades trigger Delaware litigation in 6-12 months.
  • EU AI Act audits halt Claude Mythos European sales by Q3 2026.

What makes Anthropic unique

  • Anthropic pioneers Constitutional AI and RLHF for model alignment.
  • Responsible Scaling Policy mandates safety thresholds before deployments.
  • Claude Platform on AWS operates independently outside hyperscaler boundaries.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Anthropic who can refer or advise you

Benefits

Flexible Work Hours

Paid Vacation

Parental Leave

Hybrid Work Options

Company Equity

Growth & Insights and Company News

Headcount

6 month growth

-3%

1 year growth

-3%

2 year growth

1%
Ars Technica
Apr 21st, 2026
Mozilla: Anthropic's Mythos AI model finds 271 zero-day bugs in Firefox 150

Mozilla has discovered 271 security vulnerabilities in Firefox 150 using early access to Anthropic's Mythos Preview AI model. The findings represent a significant increase from the 22 bugs detected by Anthropic's Opus 4.6 model in Firefox 148 last month. Firefox CTO Bobby Holley said Mythos is "every bit as capable" as the world's best security researchers, whilst eliminating the need to "concentrate many months of costly human effort to find a single bug". He believes AI tools like Mythos tilt the cybersecurity balance towards defenders by making vulnerability discovery cheaper. Anthropic released Mythos Preview to a limited group of industry partners earlier this month. Mozilla CTO Raffi Krikorian argues such tools are particularly crucial for open source projects, which often rely on insufficient volunteer maintenance for security.

Bloomberg L.P.
Apr 21st, 2026
Anthropic's Mythos AI sparks fear and hope over cybersecurity threats to global finance

Anthropic's new AI model Mythos has sparked concern amongst policymakers at International Monetary Fund meetings over its potential to accelerate sophisticated cyberattacks on the global financial system. However, its developers argue the technology could provide banks with their strongest defence yet. What distinguishes Mythos is its ability to chain multiple security weaknesses into coordinated attacks, effectively automating complex cyber intrusions. This capability could significantly expand the pool of potential attackers in the near term. The model's creators emphasise a longer-term benefit: the same technology could enable banks to detect and patch vulnerabilities faster than ever, potentially shifting the balance towards defenders if widely adopted. The dual-use nature of Mythos has created both panic and optimism in boardrooms and governments regarding global financial system security.

Bloomberg L.P.
Apr 17th, 2026
Indian fintechs push Anthropic for early access to 'dangerous' Mythos AI model

Indian fintech companies including One97 Communications, Razorpay Software and Pine Labs are pushing Anthropic for early access to Mythos, the AI model that has raised global concerns about cyberattack risks. The firms want to test Mythos on their own systems to detect vulnerabilities following Anthropic's announcement of a limited rollout. The San Francisco-based AI developer considers the model too dangerous for wider release but major Indian financial technology companies are seeking early access to assess potential security threats to their platforms.

Bloomberg L.P.
Apr 16th, 2026
US government prepares to give federal agencies access to Anthropic's Mythos AI model

The US government is preparing to provide major federal agencies with access to Anthropic's new AI model, Mythos, according to a memo reviewed by Bloomberg News. Gregory Barbaccia, federal chief information officer at the White House Office of Management and Budget, informed Cabinet department officials on Tuesday that OMB is establishing protections to enable agencies to use the closely guarded AI tool. The move comes amid concerns that the powerful model could significantly increase cybersecurity risks. OMB is working to set up appropriate safeguards before rolling out access to the system across government departments.

Bloomberg L.P.
Apr 16th, 2026
Anthropic's Mythos AI model raises cybersecurity alarms for banks and governments

Anthropic's new Mythos AI model is causing concern among banks, tech giants and governments over its potential implications for cybersecurity and the internet's future. The model has prompted a scramble amongst major institutions to understand its capabilities and risks. Details about the specific features raising alarms were not disclosed in the source material.