Solved: Platforms listed twice, different results on different platform and SIGSEGV.

Dmitry_Savin · ‎12-16-2019

I'm trying to verify that different available devices work consistently. Thus instead of using the default selector for the vector addition example I execute the same function on different devices. While it first seems to work correctly for FPGA emulation, Intel CPU and Intel GPU, then the Intel GPU is listed for the second time and gives an incorrect result (zero), Nvidia GPU also gives zero, and on Intel CPU the program crashes. In the attachment are the source, program output, gdb and valgrind output and device information listed by Computecpp implementation. The system is an up-to-date Ubuntu Eoan. Command to build: mkdir build && cd build && cmake .. -DCMAKE_CXX_COMPILER=`which dpcpp` -G Ninja -DCMAKE_BUILD_TYPE=Debug && cmake --build .

AbhishekD_Intel · ‎12-17-2019

Hi,

Regarding your first issue, we can see that you have different drivers for the same device (iGPU), from which one of the drivers doesn't support our toolkit this is the reason why it's not giving the correct result.

- Abhishek

View solution in original post

AbhishekD_Intel · ‎12-16-2019

Hi,

Thanks for reaching out to us.

We are working on it and will get back to you.

-Abhishek

Dmitry_Savin · ‎12-17-2019

I gained access to the Intel DevCloud and modified that cmake file to work with an older version. The attached output on a node without a GPU is as expected.

On a node with a GPU the program is stuck when submitting the vector addition task, no output until the job is killed with qdel.

On a FPGA the result is incorrect (zero), while the FPGA emulation gives the correct result.

AbhishekD_Intel · ‎12-17-2019

Hi,

We tried running your code on our Devcloud and got the same output as you got.

One thing I want to tell you that while running our code on "Intel(R) FPGA SDK for OpenCL(TM)" we have to follow some more additional steps, for more details regarding FPGA you can refer OneAPIProgrammingGuide, but for other devices, the same flow of execution will give you the correct output.

We also got the correct output while running over iGPU so try to re-execute the code and you will get the correct result.

Currently, NVIDIA GPU is not supported by our toolkit soon you will find more updates on it.

The Attachment shows the correct output over Intel(R) FPGA SDK for OpenCL(TM), steps you can follow to run on Intel(R) FPGA SDK for OpenCL(TM) are:

$ dpcpp -fintelfpga main.cpp -c -o mainfpga.o
$ dpcpp -fintelfpga mainfpga.o -Xshardware
$ ./a.out

You have to do all these while on the fpga_runtime node.

Get back to us if you face any issues.

- Abhishek

AbhishekD_Intel · ‎12-17-2019

Hi,

Regarding your first issue, we can see that you have different drivers for the same device (iGPU), from which one of the drivers doesn't support our toolkit this is the reason why it's not giving the correct result.

- Abhishek

Dmitry_Savin · ‎12-17-2019

Hi Abhishek,

Thank you for the explanation. What is the correct way to replace the default iGPU driver? Nvidia was not a surprise, but what is the correct way to explicitly determine during execution if the binary-driver-device combination is supposed to work?

Will try a different GPU node a bit later.

- Dmitry.

AbhishekD_Intel · ‎12-17-2019

Hi Dmitry,

You can include Asynchronous Exception Handler in your code to check whether your code is running on a particular device. And if you pass default_selector to the queue instead of passing platform id's at each execution, it will automatically select your default iGPU which has the maximum score value. You can embed the following code snippet into your code:

cl::sycl::queue queue(cl::sycl::default_selector{}, exception_handler);

-Abhishek

Dmitry_Savin · ‎12-18-2019

GPU hangs on the first node (job 448419.v-qsvr-1 right now) chosen if the queue is empty, but does complete successfully in 4 seconds on another node. Should I report it to the DevCloud-specific forum?

AbhishekD_Intel · ‎12-18-2019

It actually didn't hang, it waits for the task to get assigned and if our queue is empty it will wait until and unless it gets some tasks to execute but after a threshold time it will automatically get killed so you need not have to think about it.

Dmitry_Savin · ‎12-18-2019

Thank you, Abhishek! I managed to remove the runtimes that were manually installed for the compiler version I earlier built from the source. On a side note, the intel-basekit package should check if a runtime is installed. Also, a working intel-opencl-icd is available from the ubuntu eoan/universe repository. I suggest we close this discussion.

AbhishekD_Intel · ‎12-18-2019

Hi Dmitry,

I am closing this thread. We will make a note of your findings.

-Abhishek