Full-Time

HPC Cluster Engineer

Posted on 4/18/2024

Genmo

Genmo

1-10 employees

AI-powered video creation services


Mid, Senior, Expert

San Francisco, CA, USA

Required Skills
Kubernetes
Docker
Ansible
Linux/Unix
Requirements
  • Proven experience managing and supporting HPC infrastructure, especially in a GPU-intensive environment.
  • Strong familiarity with Linux OS flavors, container technologies (Singularity, Docker, Kubernetes) and host management technologies (Ansible).
  • Experience with HPC job schedulers (Slurm, LSF) and monitoring tools (Prometheus, NVIDIA DCGM).
  • Knowledge in configuring and optimizing RDMA networks and NVMe-backed storage solutions for high-performance computing.
  • Effective problem-solving skills, with the ability to manage incidents and maintenance tasks efficiently.
  • Excellent communication skills, with the capability to provide on-call support and respond to urgent issues.
Responsibilities
  • Cluster Maintenance and Support
  • Incident Response
  • Access Control Management
  • System and Software Updates
  • Monitoring

Genmo specializes in AI-powered video creation and related services, utilizing artificial intelligence and video production tools.

Company Stage

N/A

Total Funding

N/A

Headquarters

N/A

Founded

N/A

Growth & Insights
Headcount

6 month growth

50%

1 year growth

100%

2 year growth

100%