GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
142 Discussions

Multi GPU clCreateBuffer failure on single context

shekar
Beginner
3,264 Views

I have two Intel ARC A770 GPUs and I am seeing this behavior (test code attached).

 

1. I am trying to allocate memory using clCreateBuffer (1 MB each).

2. I have two GPU devices.

3. The test code creates a context, loads a kernel program, builds it.

4. The test program creates only one context (and can do that for 1 device or 2 devices).

5. When I run the code in 1 GPU mode, I am able to allocate memory (over 10000 allocations). But when I run the code in 2 GPU mode, I run out of memory (out of host memory after 1000 allocations -- around 1 GB memory). OpenCL error code -6. 

 

The GPUs have 16 GB RAM and my host has 192 GB RAM.

 

Any idea what I am doing wrong?

 

The command to build the code is:

 

 

gcc -D CL_TARGET_OPENCL_VERSION=220 -g -Wall -o OpenCLMulti OpenCLMulti.cpp -lOpenCL -lm

 

 

 

0 Kudos
15 Replies
KonstantyMisiak
Employee
3,183 Views

Hi, I'm looking into your issue.
Could you provide more details about your setup, such as OS distribution, kernel version, OneAPI/OpenCL driver version?

0 Kudos
shekar
Beginner
3,179 Views

Hi @KonstantyMisiak 

 

Here is the output of "uname -a"

Linux gordian-2 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

 

I am attaching the output of clinfo.

 

Also, another interesting thing is that once in a while, Xorg will just crash and put me back in the login screen. This started happening after I installed the two A770 GPUs (I used to have a single A750 and everything was working okay -- though frankly it was an ordeal to get that up and running).

 

Anyway, thanks for looking into this issue. Appreciate your help.

0 Kudos
shekar
Beginner
3,111 Views

@KonstantyMisiak 

 

Do you have everything you need from me to look into the issue? Thanks.

0 Kudos
KonstantyMisiak
Employee
3,106 Views

@shekar yes, this issue is internally tracked and me and my team are looking into it.

shekar
Beginner
2,973 Views

@KonstantyMisiak Any update on this issue? Thanks.

0 Kudos
KonstantyMisiak
Employee
2,861 Views

@shekar We couldn't reproduce your issue on our standard setup. We are working on recreating the closest setup to yours and reproducing this issue.

0 Kudos
shekar
Beginner
2,853 Views

@KonstantyMisiak What is the standard setup you have? Thanks.

0 Kudos
shekar
Beginner
2,782 Views

@KonstantyMisiak Can you tell me the standard setup? This issue has become a huge blocker for me and I really need to get this done. Thanks.

0 Kudos
KonstantyMisiak
Employee
2,773 Views

The biggest difference seems to be that our standard setup uses linux kernel with internal patches which can affect the issue

0 Kudos
shekar
Beginner
2,771 Views

@KonstantyMisiak You mean kernel patches that only you have access to or something that I can download too? Please provide more clarity on this. If you think this is a "bug" in the driver, and will require an upgrade, can you provide some information on when such an upgrade will be available? Thanks.

0 Kudos
KonstantyMisiak
Employee
2,770 Views

Yes, these patches are internal.
I have to kindly ask for your patience while we are trying to reproduce your issue. If we establish that this is the bug in the driver we will need to pinpoint which component is at fault so we can direct it to the proper subteam.

0 Kudos
shekar
Beginner
2,586 Views

@KonstantyMisiak Can you give me some idea as to when you are going to look at my issue? Just to be clear, I installed the exact driver and Linux kernel recommended in your documentation. So, I am surprised to hear that you don't have a machine lying around with that same configuration. If you don't think this issue can be solved, please say so and I will make contingency plans (like upgrading to NVIDIA GPUs). Though, I would prefer to not change my hardware. 

0 Kudos
MaciejPlewka
Employee
2,488 Views

Hi @shekar 

 

I was able to reproduce your issue with OpenCL driver in version 23.05.25593.18 . 
When I updated driver to 23.13.26032.26 version looks like issue is gone. 

Can you please test it with latest driver?

0 Kudos
shekar
Beginner
2,472 Views

@MaciejPlewkaI upgraded the driver and now clinfo doesn't recognize the platform. I reinstalled Ubuntu 22.04 and while following the instructions given here: https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html, I am getting this error:

sudo apt-get update && sudo apt-get install -y --install-suggests linux-image-5.19.0-35-generic
Hit:1 http://us.archive.ubuntu.com/ubuntu jammy InRelease
Hit:2 http://us.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:3 http://us.archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:4 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:5 https://repositories.intel.com/graphics/ubuntu jammy InRelease
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package linux-image-5.19.0-35-generic
E: Couldn't find any package by glob 'linux-image-5.19.0-35-generic'
E: Couldn't find any package by regex 'linux-image-5.19.0-35-generic'

What do I do now?

0 Kudos
MaciejPlewka
Employee
2,455 Views

I was also using this instruction and met the same problem. There is no linux-image-5.19.0-35-generic in added repositories, but I found that linux-image-5.19.0-37 should be there. 

0 Kudos
Reply