Intel® DevCloud
Help for those needing help starting or connecting to the Intel® DevCloud
1596 Discussions

How can I allocate a dual or a quad GPU compute node?

brucechen057
Beginner
1,517 Views

How can I allocate a dual or a quad Intel® Iris® Xe MAX Graphics compute node? 

 

The command in the guide (https://devcloud.intel.com/oneapi/documentation/job-submission/) suggests something like:

qsub -l nodes=1:iris_xe_max:dual_gpu:ppn=2 job_script.sh 

Screen Shot 2022-05-05 at 1.51.19 PM.png

 

However, this seems no longer supported:

$ qsub -I -l nodes=1:iris_xe_max:dual_gpu:ppn=2
qsub: submit error (Job exceeds queue resource limits MSG=cannot locate feasible nodes (nodes file is empty, all systems are busy, or no nodes have the requested feature))

 

And I can that kind of property anymore

$ pbsnodes | sort | grep properties | grep iris_xe_max | uniq
properties = core,cfl,i9-10920x,ram32gb,net1gbe,gpu,iris_xe_max,gpu

0 Kudos
14 Replies
AlekhyaV_Intel
Moderator
1,486 Views

Hi,

 

Thank you for posting in Intel Communities. Due to some internal issues, the quad_gpu & dual_gpu nodes are not available to the public. It is a known issue & admin team is addressing this. It might take some time for those nodes to be available to public. As of now, you can connect with the compute nodes which hosts a single GPU.

 

Regards,

Alekhya

 

0 Kudos
brucechen057
Beginner
1,448 Views

Thanks for your response.
Another question regarding multiple GPUs, is there any tool, like nvidia-smi for NVIDIA GPUs, that can check how many GPUs are within a node? How can I check the GPU usage and make sure I'm running the kernels on GPUs?

0 Kudos
AlekhyaV_Intel
Moderator
1,441 Views

Hi,

 

You can view the details of the GPU, CPU etc in that specific node via below command:

 

sycl-ls

 

 

Regards,

Alekhya

 

 

0 Kudos
DanDaMan
New Contributor I
1,434 Views

s012-n004 seems broken


s012-n004:~$ sycl-ls --verbose
Abort was called at 400 line in file:
/opt/src/opencl/shared/source/os_interface/linux/drm_neo.cpp
Aborted

vs

s011-n003:~$ sycl-l
[opencl:0] ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.13.11.0.23_160000]
[opencl:0] CPU : Intel(R) OpenCL 3.0 [2021.13.11.0.23_160000]
[opencl:0] GPU : Intel(R) OpenCL HD Graphics 3.0 [21.49.21786]
[opencl:1] GPU : Intel(R) OpenCL HD Graphics 3.0 [21.49.21786]
[level_zero:0] GPU : Intel(R) Level-Zero 1.2 [1.2.21786]
[level_zero:1] GPU : Intel(R) Level-Zero 1.2 [1.2.21786]
[host:0] HOST: SYCL host platform 1.2 [1.2]

I tried posting a message about this, but it got marked as spam (Update: now fixed)

0 Kudos
AlekhyaV_Intel
Moderator
1,402 Views

Hi @DanDaMan ,

 

Thank you for sharing your observations. We could reproduce your issue. There might be some issue with that node. We will connect with the dev team & get it fixed.

 

Regards,

Alekhya

 

0 Kudos
AlekhyaV_Intel
Moderator
1,343 Views

Hi @DanDaMan ,

 

We could observe that you raised a new thread in the community regarding the issue with s012-n004 node in DevCloud. So we will continue monitoring your issue in the other thread(https://community.intel.com/t5/Intel-DevCloud/s012-n004-Broken-OneAPI-VPL-GPU-iris-xe-max-not-working-but/m-p/1384336#M4747)

 

Regards,

Alekhya

 

0 Kudos
AlekhyaV_Intel
Moderator
1,337 Views

Hi @brucechen057 ,

 

Can we discontinue monitoring this thread? Please give us an update.

 

Regards,

Alekhya

 

0 Kudos
DanDaMan
New Contributor I
1,326 Views

Unfortunately the other thread has been closed, and s012-n004 is still broken, and still available as if it was working:

########################################################################
#      Date:           Tue 17 May 2022 08:02:39 AM PDT
#    Job ID:           1909065.v-qsvr-1.aidevcloud
#      User:           uXXXXXX
# Resources:           neednodes=s012-n004:iris_xe_max:ppn=2,nodes=s012-n004:iris_xe_max:ppn=2,walltime=24:00:00
########################################################################

uXXXXXX@s012-n004:~$ sycl-ls --verbose
Abort was called at 456 line in file:
/opt/src/opencl/shared/source/os_interface/linux/drm_neo.cpp
Aborted

 While there is a workaround by manually searching for and allocating nodes, is there a timeline for when this might be fixed, or at least removed from the pool of iris_xe_max/gpu nodes so that jobs which rely on those features aren't randomly failing?

0 Kudos
AlekhyaV_Intel
Moderator
1,278 Views

Hi @DanDaMan ,

 

We apologize for the inconvenience caused. We have contacted the admin team regarding this issue and they're trying to fix that particular node. It might take some time for that node to be fixed. You could continue your work on other working nodes for now.

 

Regards,

Alekhya

 

0 Kudos
AlekhyaV_Intel
Moderator
1,187 Views

Hi @DanDaMan,

 

We tried the below command in s012-n004 node and it seems like the node is working fine now. Could you please check if the issue still persists or not?

sycl-ls --verbose

Please update us.

 

Regards,

Alekhya

 

0 Kudos
DanDaMan
New Contributor I
1,174 Views

It's different, but still broken. Now it just hangs... 

uXXXXXX@s012-n004:~$ sycl-ls
    [...hang]

vs

uXXXXXXX@s001-n012:~$ sycl-ls --verbose
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz 3.0 [2022.13.3.0.16_160000]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

Platforms: 3
Platform [#1]:
    Version  : OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
    Name     : Intel(R) FPGA Emulation Platform for OpenCL(TM)
    Vendor   : Intel(R) Corporation
   [...etc]

 

0 Kudos
DanDaMan
New Contributor I
1,130 Views

s012-n004 seems to be working now, although a bunch of other xe nodes seems to be various broken states ( see https://community.intel.com/t5/Intel-DevCloud/Video-support-removed-from-iris-xe-max-nodes/m-p/1396511#M5254 )

s012-n004:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz 3.0 [2022.13.3.0.16_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Iris(R) Xe MAX Graphics [0x4905] 3.0 [22.10.22597]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe MAX Graphics [0x4905] 1.3 [1.3.22597]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

 

0 Kudos
DanDaMan
New Contributor I
1,148 Views

Sorry, just notced I used syscl-ls but forgot --verbose

Can confirm s012-n004 seems to be working now for --verbose, but hangs without it, but s001-n012 works fine without --verbose.

 

s001-n012:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz 3.0 [2022.13.3.0.16_160000]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

 

 

0 Kudos
AlekhyaV_Intel
Moderator
1,103 Views

Hi @DanDaMan ,


Issue regarding various Iris Xe Max broken nodes is being handled in this thread(https://community.intel.com/t5/Intel-DevCloud/Video-support-removed-from-iris-xe-max-nodes/m-p/1396511#M5254)


As you've confirmed that s012-n004 node is working fine, we are closing this thread now. If you need any further assistance, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Alekhya


0 Kudos
Reply