job details May 2026

GPU Systems Engineer

type Full time (EOI) location Remote schedule Business hours with on-call rotation date May 23, 2026

About the Role

This is an Expression of Interest, not an active role.

We run GPU clusters on AMD Instinct and Nvidia HGX-class hardware. The systems engineering job is everything from firmware and ROCm or CUDA stacks down through fabric, optics, RDMA and storage, up to tenant-ready clusters.

If you have built or operated production GPU systems at meaningful scale, we want to know who you are.

Responsibilities

  • Bring up new GPU clusters: firmware, BIOS, driver stack, fabric configuration, validation.
  • Tune and troubleshoot RDMA, RoCE and NCCL or RCCL behavior at the cluster level.
  • Operate ROCm, CUDA and the supporting library stack across tenants.
  • Coordinate with platform, network and DC teams on capacity, reliability and hardware swaps.
  • Write the runbooks the next operator will rely on.

Required Skills and Experience

  • Hands-on experience with production GPU clusters, AMD Instinct or Nvidia HGX-class.
  • Strong Linux fundamentals, kernel and driver-level troubleshooting.
  • Understanding of RDMA fabric design, NCCL or RCCL tuning, and multi-node training performance.
  • Comfort with firmware updates, hardware diagnostics and vendor escalations.
  • Methodical. You isolate the variable rather than swap the part.

About OneQode

OneQode is a global provider of performance digital infrastructure. With a vertically-integrated platform that spans cloud compute, low-latency networking and sovereign technology across over 30 datacenters in 5 continents, they enable enterprises, governments and performance-hungry businesses to run AI & mission-critical workloads at scale, across the globe.

How to Apply

If this sounds like you, we'd love to hear from you.

Click the button below to apply.

browse similar roles
  • NOC Engineer

    type Full time (Contract) location Remote (Malaysia) shift 24x7 Shift Rotation date

  • Solutions Architect

    type Full time location Remote (APAC preferred) shift Standard business hours date

  • Cloud Platform Engineer

    type Full time location Remote shift Standard business hours date

  • PR & Narrative Lead

    type Full time location Remote (APAC time zone preferred) shift Standard business hours date

  • Enterprise Sales

    type Full time location US, ASEAN or Europe shift Aligned to target region date

  • Executive Assistant

    type Full time location Remote (APAC time zone) shift Standard business hours date

  • Head of People

    type Full time location Remote shift Standard business hours date

  • Legal Counsel

    type Full time location Remote shift Standard business hours date

  • Datacenter Operations Engineer

    type Full time location Bangkok, Thailand shift On-site with on-call rotation date

Ready to get started?

Talk to our infrastructure team about your next deployment.