detalles del puesto May 2026

GPU Systems Engineer

tipo Full time (EOI) ubicación Remote horario Business hours with on-call rotation fecha May 23, 2026

About the Role

This is an Expression of Interest, not an active role.

We run GPU clusters on AMD Instinct and Nvidia HGX-class hardware. The systems engineering job is everything from firmware and ROCm or CUDA stacks down through fabric, optics, RDMA and storage, up to tenant-ready clusters.

If you have built or operated production GPU systems at meaningful scale, we want to know who you are.

Responsibilities

  • Bring up new GPU clusters: firmware, BIOS, driver stack, fabric configuration, validation.
  • Tune and troubleshoot RDMA, RoCE and NCCL or RCCL behaviour at the cluster level.
  • Operate ROCm, CUDA and the supporting library stack across tenants.
  • Coordinate with platform, network and DC teams on capacity, reliability and hardware swaps.
  • Write the runbooks the next operator will rely on.

Required Skills and Experience

  • Hands-on experience with production GPU clusters, AMD Instinct or Nvidia HGX-class.
  • Strong Linux fundamentals, kernel and driver-level troubleshooting.
  • Understanding of RDMA fabric design, NCCL or RCCL tuning, and multi-node training performance.
  • Comfort with firmware updates, hardware diagnostics and vendor escalations.
  • Methodical. You isolate the variable rather than swap the part.

About OneQode

OneQode is a global provider of performance digital infrastructure. With a vertically-integrated platform that spans cloud compute, low-latency networking and sovereign technology across over 30 datacentres in 5 continents, they enable enterprises, governments and performance-hungry businesses to run AI & mission-critical workloads at scale, across the globe.

How to Apply

If this sounds like you, we'd love to hear from you.

Click the button below to apply.

browse similar roles

NOC Engineer

tipo Tiempo completo (Contrato) ubicación Remoto (Malasia) turno Rotación de turnos 24x7 fecha

Solutions Architect

tipo Full time ubicación Remote (APAC preferred) turno Standard business hours fecha

Cloud Platform Engineer

tipo Full time ubicación Remote turno Standard business hours fecha

PR & Marketing Lead

tipo Full time ubicación Remote (APAC time zone) turno Standard business hours fecha

Enterprise Sales

tipo Full time ubicación US, ASEAN or Europe turno Aligned to target region fecha

Executive Assistant

tipo Full time ubicación Remote (APAC time zone) turno Standard business hours fecha

Head of People

tipo Full time ubicación Remote turno Standard business hours fecha

Legal Counsel

tipo Full time ubicación Remote turno Standard business hours fecha

Datacentre Operations Engineer

tipo Full time ubicación Bangkok, Thailand turno On-site with on-call rotation fecha

¿Listo para comenzar?

Habla con nuestro equipo de infraestructura sobre tu próximo despliegue.