Intel Extension for Pytorch program does not detect GPU on DevCloud

YuanM · ‎03-29-2023

Hi,

I am trying to deploy DNN inference/training workloads in pytorch using GPUs provided by DevCloud. I tried the tutorial "Intel_Extension_For_PyTorch_GettingStarted" following the procedure:

qsub -I -l nodes=1:gpu:ppn=2 -d .

export LD_LIBRARY_PATH=/glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/torch/lib/:$LD_LIBRARY_PATH

patch < ./codes_for_ipynb/gpu.patch

./q ./run.sh

And the output file (returned run.sh.e) shows the following error:

[W OperatorEntry.cpp:150] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key operator: torchvision::nms no debug info dispatch key: CPU previous kernel: registered at /build/intel-pytorch-extension/csrc/cpu/aten/TorchVisionNms.cpp:47 new kernel: registered at /opt/workspace/vision/torchvision/csrc/ops/cpu/nms_kernel.cpp:112 (function registerKernel) /glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/intel_extension_for_pytorch/xpu/lazy_init.py:73: UserWarning: DPCPP Device count is zero! (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/runtime/Device.cpp:120.) _C._initExtension() /glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py:983: UserWarning: dpcppSetDevice: device_id is out of range (Triggered internally at /build/intel-pytorch-extension/csrc/gpu/runtime/Device.cpp:159.) return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, Traceback (most recent call last): File "/home/u186670/oneAPI-samples/AI-and-Analytics/Getting-Started-Samples/Intel_Extension_For_PyTorch_GettingStarted/Intel_Extension_For_PyTorch_Hello_World.py", line 123, in <module> main() File "/home/u186670/oneAPI-samples/AI-and-Analytics/Getting-Started-Samples/Intel_Extension_For_PyTorch_GettingStarted/Intel_Extension_For_PyTorch_Hello_World.py", line 77, in main model = model.to("xpu", memory_format=torch.channels_last) File "/glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 987, in to return self._apply(convert) File "/glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 639, in _apply module._apply(fn) File "/glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 662, in _apply param_applied = fn(param) File "/glob/development-tools/versions/oneapi/2023.0.1/oneapi/intelpython/latest/envs/pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 983, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, RuntimeError: Number of dpcpp devices should be greater than zero!

Is there any step I am missing for accessing GPU resources on the DevCloud?