GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
Comunicados
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
370 Discussões

Multi GPU clCreateBuffer failure on single context

shekar
Principiante
6.381 Visualizações

I have two Intel ARC A770 GPUs and I am seeing this behavior (test code attached).

 

1. I am trying to allocate memory using clCreateBuffer (1 MB each).

2. I have two GPU devices.

3. The test code creates a context, loads a kernel program, builds it.

4. The test program creates only one context (and can do that for 1 device or 2 devices).

5. When I run the code in 1 GPU mode, I am able to allocate memory (over 10000 allocations). But when I run the code in 2 GPU mode, I run out of memory (out of host memory after 1000 allocations -- around 1 GB memory). OpenCL error code -6. 

 

The GPUs have 16 GB RAM and my host has 192 GB RAM.

 

Any idea what I am doing wrong?

 

The command to build the code is:

 

 

gcc -D CL_TARGET_OPENCL_VERSION=220 -g -Wall -o OpenCLMulti OpenCLMulti.cpp -lOpenCL -lm

 

 

 

0 Kudos
15 Respostas
KonstantyMisiak
Funcionário
6.300 Visualizações

Hi, I'm looking into your issue.
Could you provide more details about your setup, such as OS distribution, kernel version, OneAPI/OpenCL driver version?

shekar
Principiante
6.296 Visualizações

Hi @KonstantyMisiak 

 

Here is the output of "uname -a"

Linux gordian-2 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

 

I am attaching the output of clinfo.

 

Also, another interesting thing is that once in a while, Xorg will just crash and put me back in the login screen. This started happening after I installed the two A770 GPUs (I used to have a single A750 and everything was working okay -- though frankly it was an ordeal to get that up and running).

 

Anyway, thanks for looking into this issue. Appreciate your help.

shekar
Principiante
6.228 Visualizações

@KonstantyMisiak 

 

Do you have everything you need from me to look into the issue? Thanks.

KonstantyMisiak
Funcionário
6.223 Visualizações

@shekar yes, this issue is internally tracked and me and my team are looking into it.

shekar
Principiante
6.090 Visualizações

@KonstantyMisiak Any update on this issue? Thanks.

KonstantyMisiak
Funcionário
5.978 Visualizações

@shekar We couldn't reproduce your issue on our standard setup. We are working on recreating the closest setup to yours and reproducing this issue.

shekar
Principiante
5.970 Visualizações

@KonstantyMisiak What is the standard setup you have? Thanks.

shekar
Principiante
5.899 Visualizações

@KonstantyMisiak Can you tell me the standard setup? This issue has become a huge blocker for me and I really need to get this done. Thanks.

KonstantyMisiak
Funcionário
5.890 Visualizações

The biggest difference seems to be that our standard setup uses linux kernel with internal patches which can affect the issue

shekar
Principiante
5.888 Visualizações

@KonstantyMisiak You mean kernel patches that only you have access to or something that I can download too? Please provide more clarity on this. If you think this is a "bug" in the driver, and will require an upgrade, can you provide some information on when such an upgrade will be available? Thanks.

KonstantyMisiak
Funcionário
5.887 Visualizações

Yes, these patches are internal.
I have to kindly ask for your patience while we are trying to reproduce your issue. If we establish that this is the bug in the driver we will need to pinpoint which component is at fault so we can direct it to the proper subteam.

shekar
Principiante
5.703 Visualizações

@KonstantyMisiak Can you give me some idea as to when you are going to look at my issue? Just to be clear, I installed the exact driver and Linux kernel recommended in your documentation. So, I am surprised to hear that you don't have a machine lying around with that same configuration. If you don't think this issue can be solved, please say so and I will make contingency plans (like upgrading to NVIDIA GPUs). Though, I would prefer to not change my hardware. 

MaciejPlewka
Funcionário
5.605 Visualizações

Hi @shekar 

 

I was able to reproduce your issue with OpenCL driver in version 23.05.25593.18 . 
When I updated driver to 23.13.26032.26 version looks like issue is gone. 

Can you please test it with latest driver?

shekar
Principiante
5.589 Visualizações

@MaciejPlewkaI upgraded the driver and now clinfo doesn't recognize the platform. I reinstalled Ubuntu 22.04 and while following the instructions given here: https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html, I am getting this error:

sudo apt-get update && sudo apt-get install -y --install-suggests linux-image-5.19.0-35-generic
Hit:1 http://us.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://us.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://us.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 https://repositories.intel.com/graphics/ubuntu jammy InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linux-image-5.19.0-35-generic
E: Couldn't find any package by glob 'linux-image-5.19.0-35-generic'
E: Couldn't find any package by regex 'linux-image-5.19.0-35-generic'

What do I do now?

MaciejPlewka
Funcionário
5.572 Visualizações

I was also using this instruction and met the same problem. There is no linux-image-5.19.0-35-generic in added repositories, but I found that linux-image-5.19.0-37 should be there. 

Responder