GPU Systems Engineer

type Full time (EOI) location Remote schedule Business hours with on-call rotation date May 23, 2026

About the Role

This is an Expression of Interest, not an active role.

We run GPU clusters on AMD Instinct and Nvidia HGX-class hardware. The systems engineering job is everything from firmware and ROCm or CUDA stacks down through fabric, optics, RDMA and storage, up to tenant-ready clusters.

If you have built or operated production GPU systems at meaningful scale, we want to know who you are.

Responsibilities

Bring up new GPU clusters: firmware, BIOS, driver stack, fabric configuration, validation.
Tune and troubleshoot RDMA, RoCE and NCCL or RCCL behavior at the cluster level.
Operate ROCm, CUDA and the supporting library stack across tenants.
Coordinate with platform, network and DC teams on capacity, reliability and hardware swaps.
Write the runbooks the next operator will rely on.

Required Skills and Experience

Hands-on experience with production GPU clusters, AMD Instinct or Nvidia HGX-class.
Strong Linux fundamentals, kernel and driver-level troubleshooting.
Understanding of RDMA fabric design, NCCL or RCCL tuning, and multi-node training performance.
Comfort with firmware updates, hardware diagnostics and vendor escalations.
Methodical. You isolate the variable rather than swap the part.

About OneQode

OneQode is a global provider of performance digital infrastructure. With a vertically-integrated platform that spans cloud compute, low-latency networking and sovereign technology across over 30 datacenters in 5 continents, they enable enterprises, governments and performance-hungry businesses to run AI & mission-critical workloads at scale, across the globe.

How to Apply

If this sounds like you, we'd love to hear from you.

Click the button below to apply.

Apply for this role

GPU Systems Engineer

About the Role

Responsibilities

Required Skills and Experience

About OneQode

How to Apply

NOC Engineer

Solutions Architect

Cloud Platform Engineer

PR & Narrative Lead

Enterprise Sales

Executive Assistant

Head of People

Legal Counsel

Datacenter Operations Engineer

Ready to get started?