Re: Re:Segfault when run with multigpu

sjunior · ‎05-16-2022

Hi all,
A receive this error, when I try run my code with two distinct GPUs:

Abort was called at 1632 line in file
/opt/src/vpg-compute-neo/level_zero/core/source/cmdlist/cmdlist_hw.inl
Command terminated by signal 6

I run this code on DevCloud.
When I run this code chosen CPU and GPU it runs sucefully.

You can see my code on this GitHub
https://github.com/sncimatec/rtm-domain-division

VarshaS_Intel · ‎05-17-2022

Hi,

Thanks for posting in Intel Communities.

>>I try run my code with two distinct GPUs

Could you please provide us with the two distinct GPU details you are using?

Could you please let us know the Intel Compiler and its version? Could you please provide us with the steps to reproduce the issue at our end?

Thanks & Regards,

Varsha

sjunior · ‎05-17-2022

Hi @VarshaS_Intel ,

>> Could you please provide us with the two distinct GPU details you are using?
I run this code on s013-n001 node in devcloud.
When I run "sycl-ls" I can see these information:

u134150@s013-n001:~$ sycl-ls
[opencl:0] ACC : Intel(R) FPGA Emulation Platform for OpenCL(TM) 1.2 [2021.13.11.0.23_160000]
[opencl:0] CPU : Intel(R) OpenCL 3.0 [2021.13.11.0.23_160000]
[level_zero:0] GPU : Intel(R) Level-Zero 1.1 [1.1.20495]
[level_zero:1] GPU : Intel(R) Level-Zero 1.1 [1.1.20495]

And when I run clinfo, the device name is: Intel(R) Graphics [0x020a]

>> Could you please let us know the Intel Compiler and its version? Could you please provide us with the steps to reproduce the issue at our end?

The compiler version is:

u134150@s013-n001:~$ dpcpp --version
Intel(R) oneAPI DPC++/C++ Compiler 2022.0.0 (2022.0.0.20211123)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /nda/development-tools/versions/oneapi/2022.1.0.nda/oneapi/compiler/2022.0.1-prerelease/linux/bin-llvm

And to reproduce this issue, you can follow these steps:

git clone https://github.com/sncimatec/rtm-domain-division

git checkout gpu

cd rtm-domain-division/lib/cwp

vim install.sh (Comment line 5, because devcloud doesn't have X11 package)

sh install.sh

cd ../../build/3lay_mod

sh run.sh

These are all instructions to reproduce the error which I pointed out.

Thanks for the help =D

VarshaS_Intel · ‎05-18-2022

Hi,

Thanks for providing the information.

Could you please provide us with the output when you are able to run without any errors?

Thanks & Regards,

Varsha

sjunior · ‎05-18-2022

Hi @VarshaS_Intel,

Before Anything, correcting the information for executing the code, I forgot one step, I will put the correct flow next:

git clone https://github.com/sncimatec/rtm-domain-division

git checkout gpu

cd rtm-domain-division/lib/cwp

vim install.sh (Comment line 5, because devcloud doesn't have X11 package)

sh install.sh

cd ../../src/

make

cd ../build/3lay_mod

../mod_main par=input.dat (This is necessary to create the input data for we can run the next script)

sh run.sh

When the code run totally, this generates an inner folder one file:

dir.image

This file has 90 KB, this is an image, and needs X11 to show.

VarshaS_Intel · ‎06-01-2022

Hi,

Thanks for providing the information.

Could you please let us know how you are trying to run the code on two different GPUs?

Could you please let us know after running which step/command you are getting the error mentioned in the original post? And also, could you please confirm whether you are facing an issue only when running in this particular node "s013-n001" in NDA Devcloud?

Thanks & Regards,

Varsha

VarshaS_Intel · ‎06-08-2022

Hi,

We have not heard back from you. Could you please provide us with the details mentioned in the previous reply?

Thanks & Regards,

Varsha

VarshaS_Intel · ‎06-16-2022

Hi,

We have not heard back from you. This thread will no longer be monitored by Intel. If you need additional information, please post a new question.

Thanks & Regards,

Varsha