Back to jobs

Senior AI Infrastructure Engineer

Techchain Talent
Remote Remote - Unknown
$100k - $150k (est.) -19% vs avg
Posted Apr 26, 2026
Apply on himalayas

Leaving for himalayas in 10s

About This Role

<h3>About the Role</h3><p>We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace, building a multi-tenancy provisioning and virtualization solution. You'll transform raw GPUs from diverse global suppliers into a programmable, orchestrated pool that serves thousands of AI developers and researchers.</p><h3>Requirements</h3><ul><li>Experience with bare-metal provisioning and lifecycle management (e.g., IPMI/Redfish, BMC, PXE, OS deployment)</li> <li>Experience with GPU scheduling and orchestration</li> <li>Experience with infrastructure and DevOps tools (e.g., Terraform or Pulumi, CI/CD, secrets management, configuration management, observability tools)</li> <li>Experience with storage and data infrastructure for AI/ML workloads (e.g., object storage, block storage, distributed file systems)</li> <li>Experience with API design and cloud-init</li> <li>Experience with GPU architecture, CUDA, and GPU compute</li> <li>Experience working with hardware vendors or vendor engineering teams</li> <li>Experience building and scaling cloud infrastructure or distributed systems in production environments</li> </ul><h3>Bonus Skills</h3><ul><li>Familiarity with high-performance networking technologies such as InfiniBand and RoCE</li> <li>Experience with distributed storage systems such as Ceph, Weka, or VAST Data</li> </ul><p>Originally posted on <a href="https://himalayas.app">Himalayas</a></p>

About the Role

We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace, building a multi-tenancy provisioning and virtualization solution. You'll transform raw GPUs from diverse global suppliers into a programmable, orchestrated pool that serves thousands of AI developers and researchers.

Requirements

  • Experience with bare-metal provisioning and lifecycle management (e.g., IPMI/Redfish, BMC, PXE, OS deployment)
  • Experience with GPU scheduling and orchestration
  • Experience with infrastructure and DevOps tools (e.g., Terraform or Pulumi, CI/CD, secrets management, configuration management, observability tools)
  • Experience with storage and data infrastructure for AI/ML workloads (e.g., object storage, block storage, distributed file systems)
  • Experience with API design and cloud-init
  • Experience with GPU architecture, CUDA, and GPU compute
  • Experience working with hardware vendors or vendor engineering teams
  • Experience building and scaling cloud infrastructure or distributed systems in production environments

Bonus Skills

  • Familiarity with high-performance networking technologies such as InfiniBand and RoCE
  • Experience with distributed storage systems such as Ceph, Weka, or VAST Data

Originally posted on Himalayas

Similar Jobs at Techchain Talent