functiedetails May 2026

GPU Systems Engineer

type Full time (EOI) locatie Remote schema Business hours with on-call rotation datum May 23, 2026

About the Role

This is an Expression of Interest, not an active role.

We run GPU clusters on AMD Instinct and Nvidia HGX-class hardware. The systems engineering job is everything from firmware and ROCm or CUDA stacks down through fabric, optics, RDMA and storage, up to tenant-ready clusters.

If you have built or operated production GPU systems at meaningful scale, we want to know who you are.

Responsibilities

  • Bring up new GPU clusters: firmware, BIOS, driver stack, fabric configuration, validation.
  • Tune and troubleshoot RDMA, RoCE and NCCL or RCCL behaviour at the cluster level.
  • Operate ROCm, CUDA and the supporting library stack across tenants.
  • Coordinate with platform, network and DC teams on capacity, reliability and hardware swaps.
  • Write the runbooks the next operator will rely on.

Required Skills and Experience

  • Hands-on experience with production GPU clusters, AMD Instinct or Nvidia HGX-class.
  • Strong Linux fundamentals, kernel and driver-level troubleshooting.
  • Understanding of RDMA fabric design, NCCL or RCCL tuning, and multi-node training performance.
  • Comfort with firmware updates, hardware diagnostics and vendor escalations.
  • Methodical. You isolate the variable rather than swap the part.

About OneQode

OneQode is a global provider of performance digital infrastructure. With a vertically-integrated platform that spans cloud compute, low-latency networking and sovereign technology across over 30 datacentres in 5 continents, they enable enterprises, governments and performance-hungry businesses to run AI & mission-critical workloads at scale, across the globe.

How to Apply

If this sounds like you, we'd love to hear from you.

Click the button below to apply.

browse similar roles

NOC Engineer

type Full time (Contract) locatie Remote (Malaysia) dienst 24x7 Shift Rotation datum

Solutions Architect

type Full time locatie Remote (APAC preferred) dienst Standard business hours datum

Cloud Platform Engineer

type Full time locatie Remote dienst Standard business hours datum

PR & Marketing Lead

type Full time locatie Remote (APAC time zone) dienst Standard business hours datum

Enterprise Sales

type Full time locatie US, ASEAN or Europe dienst Aligned to target region datum

Executive Assistant

type Full time locatie Remote (APAC time zone) dienst Standard business hours datum

Head of People

type Full time locatie Remote dienst Standard business hours datum

Legal Counsel

type Full time locatie Remote dienst Standard business hours datum

Datacentre Operations Engineer

type Full time locatie Bangkok, Thailand dienst On-site with on-call rotation datum

Klaar om te beginnen?

Praat met ons infrastructuurteam over uw volgende implementatie.