تفاصيل الوظيفة May 2026

GPU Systems Engineer

النوع Full time (EOI) الموقع Remote الجدول الزمني Business hours with on-call rotation التاريخ May 23, 2026

About the Role

This is an Expression of Interest, not an active role.

We run GPU clusters on AMD Instinct and Nvidia HGX-class hardware. The systems engineering job is everything from firmware and ROCm or CUDA stacks down through fabric, optics, RDMA and storage, up to tenant-ready clusters.

If you have built or operated production GPU systems at meaningful scale, we want to know who you are.

Responsibilities

  • Bring up new GPU clusters: firmware, BIOS, driver stack, fabric configuration, validation.
  • Tune and troubleshoot RDMA, RoCE and NCCL or RCCL behaviour at the cluster level.
  • Operate ROCm, CUDA and the supporting library stack across tenants.
  • Coordinate with platform, network and DC teams on capacity, reliability and hardware swaps.
  • Write the runbooks the next operator will rely on.

Required Skills and Experience

  • Hands-on experience with production GPU clusters, AMD Instinct or Nvidia HGX-class.
  • Strong Linux fundamentals, kernel and driver-level troubleshooting.
  • Understanding of RDMA fabric design, NCCL or RCCL tuning, and multi-node training performance.
  • Comfort with firmware updates, hardware diagnostics and vendor escalations.
  • Methodical. You isolate the variable rather than swap the part.

About OneQode

OneQode is a global provider of performance digital infrastructure. With a vertically-integrated platform that spans cloud compute, low-latency networking and sovereign technology across over 30 datacentres in 5 continents, they enable enterprises, governments and performance-hungry businesses to run AI & mission-critical workloads at scale, across the globe.

How to Apply

If this sounds like you, we'd love to hear from you.

Click the button below to apply.

browse similar roles

NOC Engineer

النوع Full time (Contract) الموقع Remote (Malaysia) الوردية 24x7 Shift Rotation التاريخ

Solutions Architect

النوع Full time الموقع Remote (APAC preferred) الوردية Standard business hours التاريخ

Cloud Platform Engineer

النوع Full time الموقع Remote الوردية Standard business hours التاريخ

PR & Marketing Lead

النوع Full time الموقع Remote (APAC time zone) الوردية Standard business hours التاريخ

Enterprise Sales

النوع Full time الموقع US, ASEAN or Europe الوردية Aligned to target region التاريخ

Executive Assistant

النوع Full time الموقع Remote (APAC time zone) الوردية Standard business hours التاريخ

Head of People

النوع Full time الموقع Remote الوردية Standard business hours التاريخ

Legal Counsel

النوع Full time الموقع Remote الوردية Standard business hours التاريخ

Datacentre Operations Engineer

النوع Full time الموقع Bangkok, Thailand الوردية On-site with on-call rotation التاريخ

هل أنت مستعد للبدء؟

تحدث إلى فريق البنية التحتية لدينا حول نشر مشروعك القادم.