Full-Time

Infrastructure Engineer

Posted on 9/15/2025

FAR AI

FAR AI

11-50 employees

Non-profit AI safety research institute

Compensation Overview

$100k - $175k/yr

+ Travel Expenses + Equipment Expenses

Remote in USA + 1 more

More locations: Berkeley, CA, USA

In Person

Category
DevOps & Infrastructure (2)
,
Required Skills
Kubernetes
Requirements
  • 3+ years of experience across multiple aspects of cluster administration including scheduling and orchestration, user administration, storage management, monitoring and observability, and networking
  • Deep technical expertise with Kubernetes OR deep technical expertise with more traditional High-Performance Computing scheduling systems such as Slurm or University Grid Engine
  • A record of progressively increasing scope and ownership over compute infrastructure
  • Self-directed and comfortable with ambiguous or rapidly evolving requirements
  • Willing to be on-call during waking hours for cluster issues ahead of major deadlines (for a few weeks a quarter)
  • Interest in improving our security posture through identifying, implementing and administering security policies
Responsibilities
  • Build and maintain a scalable and easy to use compute cluster to support impactful research
  • Empower the research team to solve their own day-to-day compute problems, such as debugging simple issues and streamlining recurring tasks (e.g. running batch experiments, launching an interactive development box)
  • Maintaining and developing in-cluster services such as backups, experiment tracking, and the in-house LLM-based cluster support bot
  • Maintaining adequate cluster stability to avoid interfering with research workloads (currently greater than 95% uptime outside of planned maintenance windows)
  • Maintaining situational awareness of the cloud GPU market and assisting leadership with vendor comparisons to ensure using the most effective compute platforms
  • Support security by securing the cluster against insider threats by architecting isolation for confidentiality and integrity of sensitive workloads and defending against external threats by minimizing attack surface and ensuring security updates are installed promptly
  • Making secure workflows the default, e.g. streamlining the deployment of internal web dashboards behind an OAuth reverse proxy
  • Championing security across the FAR.AI team including maintaining and extending our mobile device management system
  • Bleeding-edge workloads: architecting Kubernetes cluster to flexibly support novel workloads
  • Assisting projects with bespoke requirements, designing and implementing effective infrastructure solutions, and sharing infrastructure wisdom with ML researchers
  • Improving observability over cluster resources and GPU utilization to rapidly diagnose and work around hardware issues or software bugs that may arise on novel workloads
Desired Qualifications
  • Experience administering Kubernetes on bare-metal servers
  • Experience managing research workloads in a High-Performance Computing setting
  • Experience supporting machine learning or artificial intelligence workloads on GPU clusters
  • Prior experience in research environments or startups
  • Willingness to be part of an eventual on-call rotation if required

FAR AI is a non-profit research institute focused on making advanced artificial intelligence safer and more beneficial for society. It conducts in-house research on AI safety topics such as model evaluation, interpretability, and robustness, and it also supports safety-driven research through collaborations and targeted grants. A key activity is red-teaming frontier AI systems to identify vulnerabilities and inform safety standards for developers and governments, while it also builds community through FAR.Labs in Berkeley and global dialogues like the International Dialogue on AI Safety. Its funding comes mainly from philanthropy, with profits from for-profit AI work capped at 10% to preserve independence, and its goal is to steer powerful AI toward minimizing risk and maximizing societal benefit through research, governance discussions, and broad collaboration.

Company Size

11-50

Company Stage

N/A

Total Funding

N/A

Headquarters

Berkeley, California

Founded

2022

Simplify Jobs

Simplify's Take

What believers are saying

  • $30M+ 2025 funding enables rapid research scaling.
  • International dialogues shape global AI safety norms.
  • Grantmaking influences major labs' safety directions.

What critics are saying

  • Coefficient Giving withdraws funding by 2028 over breakthroughs.
  • Anthropic poaches Berkeley researchers in 6-12 months.
  • DWF model cuts org funding in 18-36 months.

What makes FAR AI unique

  • FAR AI red-teams frontier models for OpenAI and EU CBRN risks.
  • FAR.Labs Berkeley co-working fosters AI safety collaborations.
  • FAR AI caps for-profit revenue at 10% for independence.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Retirement Plan

Remote Work Options

Hybrid Work Options

Paid Vacation

Paid Holidays

Sabbatical Leave

Flexible Work Hours

Wellness Program

Mental Health Support

Gym Membership

Phone/Internet Stipend

Home Office Stipend

Professional Development Budget

Conference Attendance Budget

Training Programs

Tuition Reimbursement

Professional Certification Support

Mentorship Program

Stock Options

Company Equity

Relocation Assistance

Adoption Assistance

Childcare Support

Elder Care Support

Parental Leave

Fertility Treatment Support

Family Planning Benefits

Employee Referral Bonus

Meal Benefits

Commuter Benefits

Legal Services

Employee Discounts

Company Social Events

Growth & Insights

Headcount

6 month growth

0%

1 year growth

0%

2 year growth

0%
INACTIVE