- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have no idea. Please help. I am not a programmer, nor do I have any tech knowledge in coding. I just install whatever is required to run ComfyUI with my B580. Am using Comfy CLI. Loaded Wan2.1 14B workflow, then these codes always show, then it dramatically slows down the workflow (almost at HALT). Please see below code.
got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: xpu:0, offload device: cpu, dtype: torch.bfloat16
# 😺dzNodes: LayerStyle -> ImageScaleByAspectRatio V2 Processed 1 image(s).
Requested to load CLIPVisionModelProjection
loaded completely 10030.4796875 1208.09814453125 True
Requested to load WanTEModel
loaded completely 9.5367431640625e+25 10835.4765625 True
CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLOW
Requested to load WanTEModel
loaded completely 0.0 10835.4765625 True
loaded completely 0.0 10835.4765625 True
Requested to load WanVAE
0 models unloaded.
loaded completely 0.0 242.02829551696777 True
onednn_verbose,v1,info,oneDNN v3.8.1 (commit df786faad216a0024da083786a5047af6014fe59)
onednn_verbose,v1,info,cpu,runtime:threadpool,nthr:6
onednn_verbose,v1,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,v1,info,gpu,runtime:DPC++
onednn_verbose,v1,info,gpu,engine,sycl gpu device count:1
onednn_verbose,v1,info,gpu,engine,0,backend:Level Zero,name:Intel(R) Arc(TM) B580 Graphics,driver_version:1.6.33511,binary_kernels:enabled
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,common,error,ocl,Error during the build of OpenCL program. Build log:
1:4014:1: error: no matching function for call to 'block2d_load'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3483:3: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
= block2d_load(ptr, m * e, n, ld * e, offset_r + ii * br, \
^~~~~~~~~~~~
1:3081:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
1:3082:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
1:3083:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
1:3084:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
1:3085:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
1:3086:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
1:4014:1: error: no matching function for call to 'block2d_store'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3497:52: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
_Pragma("unroll") for (int ii = 0; ii < nbr; ii++) block2d_store( \
^~~~~~~~~~~~~
1:3084:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
1:3085:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
1:3086:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort16' (vector of 16 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
1:3081:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
1:3082:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
1:3083:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half16' (vector of 16 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
,src\gpu\intel\ocl\engine.cpp:166
onednn_verbose,v1,primitive,error,ocl,errcode -11,CL_BUILD_PROGRAM_FAILURE,src\gpu\intel\ocl\engine.cpp:270,src\gpu\intel\ocl\engine.cpp:270
onednn_verbose,v1,common,error,ocl,Error during the build of OpenCL program. Build log:
2:4014:1: error: no matching function for call to 'block2d_load'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3483:3: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
= block2d_load(ptr, m * e, n, ld * e, offset_r + ii * br, \
^~~~~~~~~~~~
2:3081:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
2:3082:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
2:3083:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
2:3084:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
2:3085:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
2:3086:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
^
2:4014:1: error: no matching function for call to 'block2d_store'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3497:52: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
_Pragma("unroll") for (int ii = 0; ii < nbr; ii++) block2d_store( \
^~~~~~~~~~~~~
2:3084:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
2:3085:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
2:3086:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort16' (vector of 16 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
2:3081:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
2:3082:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
2:3083:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half16' (vector of 16 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
^
,src\gpu\intel\ocl\engine.cpp:166
onednn_verbose,v1,primitive,error,ocl,errcode -11,CL_BUILD_PROGRAM_FAILURE,src\gpu\intel\ocl\engine.cpp:270,src\gpu\intel\ocl\engine.cpp:270
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I try the example of Wan on B570 (driver 25.22.1502.2) on Windows 11.
It's passed.
Note, don't need to enable oneAPI running time in CMD, like (via oneAPI command line) .
PyTorch for XPU and IPEX will install the oneAPI running time during installation. No need user install and enable them.
I guess this issue is due the external oneAPI running time doesn't match the version of PyTorch needed.
Here is the installation process:
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu --trusted-host download.pytorch.org
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector opencl:gpu;level_zero:gpu
I follow the example of Wan: https://comfyanonymous.github.io/ComfyUI_examples/wan/
download the model files and save them to folder "models"
restart the ComfyUI.
open the workflow json file: https://comfyanonymous.github.io/ComfyUI_examples/wan/text_to_video_wan.json by Web GUI: http://127.0.0.1:8188.
After about 300+s, I got the result correctly.
Could you check your installation steps?
If you still have the issue, please share your difference: installation, model, hardware driver info.
Thank you!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jackie999,
Have you tried the small model like : Wan2.1 1.3B , does it work?
I will ask team to help the issue. if it is convinent, please also submit the issue to pytorch Github
pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration
thanks
P.S Getting Started on Intel GPU — PyTorch main documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both Wan2.1 1.3B and 14B models seem to work OK, but slow. I do not know if the error is relevant or contributes to the (kind of) slow inference process for my settings.
Other than that, the B580 is a great choice. I love it and will stay with it for a long while. Hope the software side of it (both the drivers and the env) will mature in no time.
Thank you, Intel.
Jack
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My startup command for ComfyUI (via oneAPI command line) is:
G:\Comfy_CLI\main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector opencl:gpu;level_zero:gpu
and the screen reads:
[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2025-06-30 10:45:09.853
** Platform: Windows
** Python version: 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]
** Python executable: H:\pinokio\bin\miniconda\python.exe
** ComfyUI Path: G:\Comfy_CLI
** ComfyUI Base Folder Path: G:\Comfy_CLI
** User directory: G:\Comfy_CLI\user
** ComfyUI-Manager config path: G:\Comfy_CLI\user\default\ComfyUI-Manager\config.ini
** Log path: G:\Comfy_CLI\user\comfyui.log
Prestartup times for custom nodes:
0.0 seconds: G:\Comfy_CLI\custom_nodes\rgthree-comfy
0.0 seconds: G:\Comfy_CLI\custom_nodes\comfyui-easy-use
11.4 seconds: G:\Comfy_CLI\custom_nodes\ComfyUI-Manager
Set oneapi device selector to: opencl:gpu;level_zero:gpu
Checkpoint files will always be loaded safely.
Total VRAM 11874 MB, total RAM 65325 MB
pytorch version: 2.8.0.dev20250619+xpu
Set vram state to: NORMAL_VRAM
Device: xpu
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]
ComfyUI version: 0.3.42
ComfyUI frontend version: 1.23.4
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have sent the required env output via email. Do you still need me to submit a bug report to pytorch github?
Kindly advise.
Jack
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Sorry to reply late!
No need to create issue again! Let me check this issue firstly.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I try the example of Wan on B570 (driver 25.22.1502.2) on Windows 11.
It's passed.
Note, don't need to enable oneAPI running time in CMD, like (via oneAPI command line) .
PyTorch for XPU and IPEX will install the oneAPI running time during installation. No need user install and enable them.
I guess this issue is due the external oneAPI running time doesn't match the version of PyTorch needed.
Here is the installation process:
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu --trusted-host download.pytorch.org
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector opencl:gpu;level_zero:gpu
I follow the example of Wan: https://comfyanonymous.github.io/ComfyUI_examples/wan/
download the model files and save them to folder "models"
restart the ComfyUI.
open the workflow json file: https://comfyanonymous.github.io/ComfyUI_examples/wan/text_to_video_wan.json by Web GUI: http://127.0.0.1:8188.
After about 300+s, I got the result correctly.
Could you check your installation steps?
If you still have the issue, please share your difference: installation, model, hardware driver info.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Even if marked as solved, it actually isn't. Please refer to attached file (wanvideo_T2V_example_02_wanwrapper). When using Kijai's wan video wrapper nodes, error still occurs as attached. Workflow comes directly from (https://github.com/kijai/ComfyUI-WanVideoWrapper), with no modifications (nothing added, just deleted some not used nodes). When I tested, I found out that the "WanVideo Decode" node created the problem for me. Once I replaced with a native VAE decoder, I ran fine.
I am posting in case your team or any other team might want to take a look, as these errors should not occur (I believe). However, if I use the native nodes, I got no problem at all. It would be great if you could identify the problem.
Anyways, thanks for your support.
Jack
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you installed PyTorch?
Could you share the python packages in your running time? Like
pip list
By the oneDNN log, looks like you are using OpenCL device, instead of SYCL device.
Could you check by following method?
In CMD via oneAPI command line, run "sycl-ls".
Maybe you could try following cmd for SYCL GPU only:
python main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector level_zero:gpu
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the follow up.
Here is my sycl-ls (in ComfyUI env):
(k:\comfyui_venv_XPU) C:\Program Files (x86)\Intel\oneAPI>sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.33890]
[opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i5-12400F OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) B580 Graphics OpenCL 3.0 NEO [32.0.101.6913]
(k:\comfyui_venv_XPU) C:\Program Files (x86)\Intel\oneAPI>
After I removed the opencl:gpu arg, the error has not shown again (yet).
If It comes back (which I believe will not), I will ask for further support. Thanks so much for your help.
Jack.
***** (RESOLVED, thanks to team) *****
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It's good news!
Parameter opencl:gpu is the root cause.
Intel GPU support both OpenCL and LevelZero running time.
PyTorch focus on LevelZero for optimization. OpenVINO focus on OpenCL.
In this case, due to the parameter opencl:gpu, PyTorch has to run on OpenCL code path.
But some code implementation isn't supported by oneDNN OpenCL code. There are the errors in above log.
Thank you for your cooperation!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I understand what you are saying. I do not have this problem anymore, since I do not use IPEX. I'd love to but it somehow still not stable.
I actually wanted to help report the issues so that the intel team can look into and fix.
Anyways, thanks for your info and I hope the whole intel team will be able to provide better software support for ARC graphics card in the near future.
If I encounter further messages, will post accordingly.
Cheers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It's great!
Thank your feedback!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page