- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How can I allocate a dual or a quad Intel® Iris® Xe MAX Graphics compute node?
The command in the guide (https://devcloud.intel.com/oneapi/documentation/job-submission/) suggests something like:
qsub -l nodes=1:iris_xe_max:dual_gpu:ppn=2 job_script.sh
However, this seems no longer supported:
$ qsub -I -l nodes=1:iris_xe_max:dual_gpu:ppn=2
qsub: submit error (Job exceeds queue resource limits MSG=cannot locate feasible nodes (nodes file is empty, all systems are busy, or no nodes have the requested feature))
And I can that kind of property anymore
$ pbsnodes | sort | grep properties | grep iris_xe_max | uniq
properties = core,cfl,i9-10920x,ram32gb,net1gbe,gpu,iris_xe_max,gpu
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities. Due to some internal issues, the quad_gpu & dual_gpu nodes are not available to the public. It is a known issue & admin team is addressing this. It might take some time for those nodes to be available to public. As of now, you can connect with the compute nodes which hosts a single GPU.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your response.
Another question regarding multiple GPUs, is there any tool, like nvidia-smi for NVIDIA GPUs, that can check how many GPUs are within a node? How can I check the GPU usage and make sure I'm running the kernels on GPUs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
You can view the details of the GPU, CPU etc in that specific node via below command:
sycl-ls
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
s012-n004 seems broken
s012-n004:~$ sycl-ls --verbose
Abort was called at 400 line in file:
/opt/src/opencl/shared/source/os_interface/linux/drm_neo.cpp
Aborted
vs
s011-n003:~$ sycl-l
[opencl:0] ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.13.11.0.23_160000]
[opencl:0] CPU : Intel(R) OpenCL 3.0 [2021.13.11.0.23_160000]
[opencl:0] GPU : Intel(R) OpenCL HD Graphics 3.0 [21.49.21786]
[opencl:1] GPU : Intel(R) OpenCL HD Graphics 3.0 [21.49.21786]
[level_zero:0] GPU : Intel(R) Level-Zero 1.2 [1.2.21786]
[level_zero:1] GPU : Intel(R) Level-Zero 1.2 [1.2.21786]
[host:0] HOST: SYCL host platform 1.2 [1.2]
I tried posting a message about this, but it got marked as spam (Update: now fixed)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DanDaMan ,
Thank you for sharing your observations. We could reproduce your issue. There might be some issue with that node. We will connect with the dev team & get it fixed.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DanDaMan ,
We could observe that you raised a new thread in the community regarding the issue with s012-n004 node in DevCloud. So we will continue monitoring your issue in the other thread(https://community.intel.com/t5/Intel-DevCloud/s012-n004-Broken-OneAPI-VPL-GPU-iris-xe-max-not-working-but/m-p/1384336#M4747)
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @brucechen057 ,
Can we discontinue monitoring this thread? Please give us an update.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately the other thread has been closed, and s012-n004 is still broken, and still available as if it was working:
########################################################################
# Date: Tue 17 May 2022 08:02:39 AM PDT
# Job ID: 1909065.v-qsvr-1.aidevcloud
# User: uXXXXXX
# Resources: neednodes=s012-n004:iris_xe_max:ppn=2,nodes=s012-n004:iris_xe_max:ppn=2,walltime=24:00:00
########################################################################
uXXXXXX@s012-n004:~$ sycl-ls --verbose
Abort was called at 456 line in file:
/opt/src/opencl/shared/source/os_interface/linux/drm_neo.cpp
Aborted
While there is a workaround by manually searching for and allocating nodes, is there a timeline for when this might be fixed, or at least removed from the pool of iris_xe_max/gpu nodes so that jobs which rely on those features aren't randomly failing?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DanDaMan ,
We apologize for the inconvenience caused. We have contacted the admin team regarding this issue and they're trying to fix that particular node. It might take some time for that node to be fixed. You could continue your work on other working nodes for now.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DanDaMan,
We tried the below command in s012-n004 node and it seems like the node is working fine now. Could you please check if the issue still persists or not?
sycl-ls --verbose
Please update us.
Regards,
Alekhya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's different, but still broken. Now it just hangs...
uXXXXXX@s012-n004:~$ sycl-ls
[...hang]
vs
uXXXXXXX@s001-n012:~$ sycl-ls --verbose
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz 3.0 [2022.13.3.0.16_160000]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
Platforms: 3
Platform [#1]:
Version : OpenCL 1.2 Intel(R) FPGA SDK for OpenCL(TM), Version 20.3
Name : Intel(R) FPGA Emulation Platform for OpenCL(TM)
Vendor : Intel(R) Corporation
[...etc]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
s012-n004 seems to be working now, although a bunch of other xe nodes seems to be various broken states ( see https://community.intel.com/t5/Intel-DevCloud/Video-support-removed-from-iris-xe-max-nodes/m-p/1396511#M5254 )
s012-n004:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) i9-10920X CPU @ 3.50GHz 3.0 [2022.13.3.0.16_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Iris(R) Xe MAX Graphics [0x4905] 3.0 [22.10.22597]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe MAX Graphics [0x4905] 1.3 [1.3.22597]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, just notced I used syscl-ls but forgot --verbose
Can confirm s012-n004 seems to be working now for --verbose, but hangs without it, but s001-n012 works fine without --verbose.
s001-n012:~$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.13.3.0.16_160000]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz 3.0 [2022.13.3.0.16_160000]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @DanDaMan ,
Issue regarding various Iris Xe Max broken nodes is being handled in this thread(https://community.intel.com/t5/Intel-DevCloud/Video-support-removed-from-iris-xe-max-nodes/m-p/1396511#M5254)
As you've confirmed that s012-n004 node is working fine, we are closing this thread now. If you need any further assistance, please post a new question as this thread will no longer be monitored by Intel.
Regards,
Alekhya
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page