About This Role

<h3>About the Role</h3><p>We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace, building a multi-tenancy provisioning and virtualization solution. You'll transform raw GPUs from diverse global suppliers into a programmable, orchestrated pool that serves thousands of AI developers and researchers.</p><h3>Requirements</h3><ul><li>Experience with bare-metal provisioning and lifecycle management (e.g., IPMI/Redfish, BMC, PXE, OS deployment)</li> <li>Experience with GPU scheduling and orchestration</li> <li>Experience with infrastructure and DevOps tools (e.g., Terraform or Pulumi, CI/CD, secrets management, configuration management, observability tools)</li> <li>Experience with storage and data infrastructure for AI/ML workloads (e.g., object storage, block storage, distributed file systems)</li> <li>Experience with API design and cloud-init</li> <li>Experience with GPU architecture, CUDA, and GPU compute</li> <li>Experience working with hardware vendors or vendor engineering teams</li> <li>Experience building and scaling cloud infrastructure or distributed systems in production environments</li> </ul><h3>Bonus Skills</h3><ul><li>Familiarity with high-performance networking technologies such as InfiniBand and RoCE</li> <li>Experience with distributed storage systems such as Ceph, Weka, or VAST Data</li> </ul><p>Originally posted on <a href="https://himalayas.app">Himalayas</a></p>

About the Role

We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace, building a multi-tenancy provisioning and virtualization solution. You'll transform raw GPUs from diverse global suppliers into a programmable, orchestrated pool that serves thousands of AI developers and researchers.

Requirements

Experience with bare-metal provisioning and lifecycle management (e.g., IPMI/Redfish, BMC, PXE, OS deployment)
Experience with GPU scheduling and orchestration
Experience with infrastructure and DevOps tools (e.g., Terraform or Pulumi, CI/CD, secrets management, configuration management, observability tools)
Experience with storage and data infrastructure for AI/ML workloads (e.g., object storage, block storage, distributed file systems)
Experience with API design and cloud-init
Experience with GPU architecture, CUDA, and GPU compute
Experience working with hardware vendors or vendor engineering teams
Experience building and scaling cloud infrastructure or distributed systems in production environments

Bonus Skills

Familiarity with high-performance networking technologies such as InfiniBand and RoCE
Experience with distributed storage systems such as Ceph, Weka, or VAST Data

Originally posted on Himalayas

Senior AI Infrastructure Engineer

About This Role

About the Role

Requirements

Bonus Skills

Similar Jobs at Techchain Talent