Solved: Re: oneAPI error while running ComfyUI

Jackie999 · ‎06-25-2025

I have no idea. Please help. I am not a programmer, nor do I have any tech knowledge in coding. I just install whatever is required to run ComfyUI with my B580. Am using Comfy CLI. Loaded Wan2.1 14B workflow, then these codes always show, then it dramatically slows down the workflow (almost at HALT). Please see below code.

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: xpu:0, offload device: cpu, dtype: torch.bfloat16
# 😺dzNodes: LayerStyle -> ImageScaleByAspectRatio V2 Processed 1 image(s).
Requested to load CLIPVisionModelProjection
loaded completely 10030.4796875 1208.09814453125 True
Requested to load WanTEModel
loaded completely 9.5367431640625e+25 10835.4765625 True
CLIP/text encoder model load device: cpu, offload device: cpu, current: cpu, dtype: torch.float16
model weight dtype torch.float8_e4m3fn, manual cast: torch.float16
model_type FLOW
Requested to load WanTEModel
loaded completely 0.0 10835.4765625 True
loaded completely 0.0 10835.4765625 True
Requested to load WanVAE
0 models unloaded.
loaded completely 0.0 242.02829551696777 True
onednn_verbose,v1,info,oneDNN v3.8.1 (commit df786faad216a0024da083786a5047af6014fe59)
onednn_verbose,v1,info,cpu,runtime:threadpool,nthr:6
onednn_verbose,v1,info,cpu,isa:Intel AVX2 with Intel DL Boost
onednn_verbose,v1,info,gpu,runtime:DPC++
onednn_verbose,v1,info,gpu,engine,sycl gpu device count:1
onednn_verbose,v1,info,gpu,engine,0,backend:Level Zero,name:Intel(R) Arc(TM) B580 Graphics,driver_version:1.6.33511,binary_kernels:enabled
onednn_verbose,v1,info,graph,backend,0:dnnl_backend
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,implementation,backend,exec_time
onednn_verbose,v1,common,error,ocl,Error during the build of OpenCL program. Build log:
1:4014:1: error: no matching function for call to 'block2d_load'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3483:3: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
= block2d_load(ptr, m * e, n, ld * e, offset_r + ii * br, \
  ^~~~~~~~~~~~
1:3081:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
1:3082:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
1:3083:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
1:3084:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
1:3085:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
1:3086:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
1:4014:1: error: no matching function for call to 'block2d_store'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1:3497:52: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
_Pragma("unroll") for (int ii = 0; ii < nbr; ii++) block2d_store( \
                                                   ^~~~~~~~~~~~~
1:3084:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
1:3085:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
1:3086:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort16' (vector of 16 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
1:3081:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
1:3082:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
1:3083:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half16' (vector of 16 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
1:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
,src\gpu\intel\ocl\engine.cpp:166
onednn_verbose,v1,primitive,error,ocl,errcode -11,CL_BUILD_PROGRAM_FAILURE,src\gpu\intel\ocl\engine.cpp:270,src\gpu\intel\ocl\engine.cpp:270
onednn_verbose,v1,common,error,ocl,Error during the build of OpenCL program. Build log:
2:4014:1: error: no matching function for call to 'block2d_load'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3483:3: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
= block2d_load(ptr, m * e, n, ld * e, offset_r + ii * br, \
  ^~~~~~~~~~~~
2:3081:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
2:3082:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
2:3083:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
2:3084:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
2:3085:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
2:3086:1: note: candidate disabled: wrong #rows
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3050:40: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) type##vl block2d_load(const global type *p, \
                                       ^
2:4014:1: error: no matching function for call to 'block2d_store'
DECLARE_2D_TILE_BLOCK2D_OPS(a_tile_type_dst, DST_DATA_T, SUBGROUP_SIZE,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2:3497:52: note: expanded from macro 'DECLARE_2D_TILE_BLOCK2D_OPS'
_Pragma("unroll") for (int ii = 0; ii < nbr; ii++) block2d_store( \
                                                   ^~~~~~~~~~~~~
2:3084:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
2:3085:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort8' (vector of 8 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
2:3086:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private ushort16' (vector of 16 'ushort' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(ushort, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
2:3081:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m8k16v1, 16, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
2:3082:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half8' (vector of 8 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 8, 16, u16_m4k32v1, 32, 4)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
2:3083:1: note: candidate function not viable: no known conversion from '__private _e_a_tile_type_dst' (vector of 32 'ushort' values) to '__private half16' (vector of 16 'half' values) for 1st argument
DEF_BLOCK2D_LOAD_STORE(half, ushort, 16, 16, u16_m8k32v1, 32, 8)
^
2:3065:36: note: expanded from macro 'DEF_BLOCK2D_LOAD_STORE'
__attribute__((overloadable)) void block2d_store(type##vl v, \
                                   ^
,src\gpu\intel\ocl\engine.cpp:166
onednn_verbose,v1,primitive,error,ocl,errcode -11,CL_BUILD_PROGRAM_FAILURE,src\gpu\intel\ocl\engine.cpp:270,src\gpu\intel\ocl\engine.cpp:270

Jianyu_Z_Intel · ‎06-30-2025

Hi,

I try the example of Wan on B570 (driver 25.22.1502.2) on Windows 11.

It's passed.

Note, don't need to enable oneAPI running time in CMD, like (via oneAPI command line) .

PyTorch for XPU and IPEX will install the oneAPI running time during installation. No need user install and enable them.

I guess this issue is due the external oneAPI running time doesn't match the version of PyTorch needed.

Here is the installation process:

python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu --trusted-host download.pytorch.org
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector opencl:gpu;level_zero:gpu

I follow the example of Wan: https://comfyanonymous.github.io/ComfyUI_examples/wan/

download the model files and save them to folder "models"

restart the ComfyUI.

open the workflow json file: https://comfyanonymous.github.io/ComfyUI_examples/wan/text_to_video_wan.json by Web GUI: http://127.0.0.1:8188.

After about 300+s, I got the result correctly.

Could you check your installation steps?

If you still have the issue, please share your difference: installation, model, hardware driver info.

Thank you!

View solution in original post

Ying_H_Intel · ‎06-29-2025

Hi Jackie999,

Have you tried the small model like : Wan2.1 1.3B , does it work?

I will ask team to help the issue. if it is convinent, please also submit the issue to pytorch Github

pytorch/pytorch: Tensors and Dynamic neural networks in Python with strong GPU acceleration

thanks

P.S Getting Started on Intel GPU — PyTorch main documentation

Jackie999 · ‎06-29-2025

Both Wan2.1 1.3B and 14B models seem to work OK, but slow. I do not know if the error is relevant or contributes to the (kind of) slow inference process for my settings.

Other than that, the B580 is a great choice. I love it and will stay with it for a long while. Hope the software side of it (both the drivers and the env) will mature in no time.

Thank you, Intel.

Jack

Jackie999 · ‎06-29-2025

My startup command for ComfyUI (via oneAPI command line) is:

G:\Comfy_CLI\main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector opencl:gpu;level_zero:gpu

and the screen reads:

[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2025-06-30 10:45:09.853
** Platform: Windows
** Python version: 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]
** Python executable: H:\pinokio\bin\miniconda\python.exe
** ComfyUI Path: G:\Comfy_CLI
** ComfyUI Base Folder Path: G:\Comfy_CLI
** User directory: G:\Comfy_CLI\user
** ComfyUI-Manager config path: G:\Comfy_CLI\user\default\ComfyUI-Manager\config.ini
** Log path: G:\Comfy_CLI\user\comfyui.log

Prestartup times for custom nodes:
0.0 seconds: G:\Comfy_CLI\custom_nodes\rgthree-comfy
0.0 seconds: G:\Comfy_CLI\custom_nodes\comfyui-easy-use
11.4 seconds: G:\Comfy_CLI\custom_nodes\ComfyUI-Manager

Set oneapi device selector to: opencl:gpu;level_zero:gpu
Checkpoint files will always be loaded safely.
Total VRAM 11874 MB, total RAM 65325 MB
pytorch version: 2.8.0.dev20250619+xpu
Set vram state to: NORMAL_VRAM
Device: xpu
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
Python version: 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)]
ComfyUI version: 0.3.42
ComfyUI frontend version: 1.23.4

...

Jackie999 · ‎06-29-2025

Hi,

I have sent the required env output via email. Do you still need me to submit a bug report to pytorch github?

Kindly advise.

Jack

Jianyu_Z_Intel · ‎06-30-2025

Hi,

Sorry to reply late!

No need to create issue again! Let me check this issue firstly.

Thank you!

Jianyu_Z_Intel · ‎06-30-2025

Hi,

I try the example of Wan on B570 (driver 25.22.1502.2) on Windows 11.

It's passed.

Note, don't need to enable oneAPI running time in CMD, like (via oneAPI command line) .

PyTorch for XPU and IPEX will install the oneAPI running time during installation. No need user install and enable them.

I guess this issue is due the external oneAPI running time doesn't match the version of PyTorch needed.

Here is the installation process:

python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu --trusted-host download.pytorch.org
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
pip install -r requirements.txt
python main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector opencl:gpu;level_zero:gpu

I follow the example of Wan: https://comfyanonymous.github.io/ComfyUI_examples/wan/

download the model files and save them to folder "models"

restart the ComfyUI.

open the workflow json file: https://comfyanonymous.github.io/ComfyUI_examples/wan/text_to_video_wan.json by Web GUI: http://127.0.0.1:8188.

After about 300+s, I got the result correctly.

Could you check your installation steps?

If you still have the issue, please share your difference: installation, model, hardware driver info.

Thank you!

Jackie999 · ‎07-02-2025

Hi,

Even if marked as solved, it actually isn't. Please refer to attached file (wanvideo_T2V_example_02_wanwrapper). When using Kijai's wan video wrapper nodes, error still occurs as attached. Workflow comes directly from (https://github.com/kijai/ComfyUI-WanVideoWrapper), with no modifications (nothing added, just deleted some not used nodes). When I tested, I found out that the "WanVideo Decode" node created the problem for me. Once I replaced with a native VAE decoder, I ran fine.

I am posting in case your team or any other team might want to take a look, as these errors should not occur (I believe). However, if I use the native nodes, I got no problem at all. It would be great if you could identify the problem.

Anyways, thanks for your support.

Jack

Jianyu_Z_Intel · ‎07-02-2025

Have you installed PyTorch?

Could you share the python packages in your running time? Like

pip list

By the oneDNN log, looks like you are using OpenCL device, instead of SYCL device.

Could you check by following method？

In CMD via oneAPI command line, run "sycl-ls".

Maybe you could try following cmd for SYCL GPU only:

python main.py --disable-xformers --disable-cuda-malloc --preview-method auto --use-quad-cross-attention --normalvram --oneapi-device-selector level_zero:gpu

Thank you!

Jackie999 · ‎07-02-2025

Hi,

Thanks for the follow up.

Here is my sycl-ls (in ComfyUI env):

(k:\comfyui_venv_XPU) C:\Program Files (x86)\Intel\oneAPI>sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) B580 Graphics 20.1.0 [1.6.33890]
[opencl:cpu][opencl:0] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i5-12400F OpenCL 3.0 (Build 0) [2025.19.4.0.18_160000.xmain-hotfix]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) B580 Graphics OpenCL 3.0 NEO [32.0.101.6913]

(k:\comfyui_venv_XPU) C:\Program Files (x86)\Intel\oneAPI>

After I removed the opencl:gpu arg, the error has not shown again (yet).

If It comes back (which I believe will not), I will ask for further support. Thanks so much for your help.

Jack.

***** (RESOLVED, thanks to team) *****

Jianyu_Z_Intel · ‎07-02-2025

Hi,

It's good news!

Parameter opencl:gpu is the root cause.

Intel GPU support both OpenCL and LevelZero running time.

PyTorch focus on LevelZero for optimization. OpenVINO focus on OpenCL.

In this case, due to the parameter opencl:gpu, PyTorch has to run on OpenCL code path.

But some code implementation isn't supported by oneDNN OpenCL code. There are the errors in above log.

Thank you for your cooperation!

Jackie999 · ‎07-01-2025

Hi,

I understand what you are saying. I do not have this problem anymore, since I do not use IPEX. I'd love to but it somehow still not stable.

I actually wanted to help report the issues so that the intel team can look into and fix.

Anyways, thanks for your info and I hope the whole intel team will be able to provide better software support for ARC graphics card in the near future.

If I encounter further messages, will post accordingly.

Cheers.

Jianyu_Z_Intel · ‎07-01-2025

Hi,

It's great!

Thank your feedback!