Segmentation fault sort function on gpu selector

eug · ‎04-28-2020

Hello,

I'm trying to understand how I can execute different algorithms on different devices.

I have a very simple program divided in 2 independent parts: the first one fills a buffer

and then sort it (on cpu device);

the second one just fills a new buffer on a GPU device.

The 2 parts are in the same source code.

The problem is a segmentation fault on GPU code if I execute the first part on cpu device

and the second one on GPU.

If I execute both on cpu, everything works (or at least there is no segmentation fault).

If I remove the sort function and execute both on GPU, it works.

It is a very useless example, it's just to understand how differents

things work.

I executed it on DevCloud.

int main(int argc, char **argv) {

   cl::sycl::queue q(cpu_selector{});

   const int n = 10000;

   buffer<int> vals_buf{n}; 
	
   auto vals_begin = dpstd::begin(vals_buf);
	
   auto counting_begin = dpstd::counting_iterator<int>{0};
	 
   auto policy = dpstd::execution::make_device_policy<class Fill>( q );
	
   std::transform(policy, counting_begin, counting_begin + n, vals_begin,(int i) { return n - (i / 2) * 2; });
	 
   std::sort(policy,vals_begin, vals_begin + n);
			  
   std::cout<<q.get_device().get_info<info::device::name>() << std::endl;

   cl::sycl::queue q2(gpu_selector{});

   cl::sycl::buffer<int> buf2 { 1000 };
   auto buf_begin2 = dpstd::begin(buf2);
   auto buf_end2   = dpstd::end(buf2);
   auto policy2 = dpstd::execution::make_device_policy<class Fill2>( q2 );
	
   std::fill(policy2, buf_begin2, buf_end2, 42);
   std::cout<<q2.get_device().get_info<info::device::name>() << std::endl;
}

AbhishekD_Intel · ‎04-29-2020

Hi Eug,

I tried the same code which you have provided and got the same error of SEGFAULT while using gpu_selector in q2 queue. And after multiple compilation and execution, there was some iteration where we are not getting SEGFAULT error.

So the thing here is that when you run the program its executes synchronously on the host side and when you launch a queue, the command queue submits the command group inside it asynchronously. Thus, if we use q.wait() after the completion of the earlier queue then it will not give SEGFAULT in the next upcoming queue. So if there are no dependencies such as memory objects (buffers) or other kernels, the program control will be returned back to the host device before going into other queues.

You can see the below code that I tried and it is working with every combination of the device selector.:

#include <iostream>
#include <CL/sycl.hpp>
#include <dpstd/execution>
#include <dpstd/algorithm>
#include <dpstd/iterators.h>
using namespace sycl;

int main(int argc, char **argv) {

   cl::sycl::queue q(gpu_selector{});
   cl::sycl::queue q2(cpu_selector{});

   auto policy = dpstd::execution::make_device_policy<class Fill>( q );
   auto policy2 = dpstd::execution::make_device_policy<class Fill2>( q2 );

   const int n = 10000;

   buffer<int> vals_buf{n};
   auto vals_begin = dpstd::begin(vals_buf);
   auto counting_begin = dpstd::counting_iterator<int>{0};
   std::transform(policy, counting_begin, counting_begin + n, vals_begin,(int i) { return n - (i / 2) * 2; });

   std::sort(policy,vals_begin, vals_begin + n);
   std::cout<<q.get_device().get_info<info::device::name>() << std::endl;

   q.wait();


   cl::sycl::buffer<int> buf2 { 1000 };
   auto buf_begin2 = dpstd::begin(buf2);
   auto buf_end2   = dpstd::end(buf2);

   std::fill(policy2, buf_begin2, buf_end2, 42);
   std::cout<<q2.get_device().get_info<info::device::name>() << std::endl;
}

Please go through the code and let us know if you still face the same issue.

I have also attached the screenshot of the output for more details.

Warm Regards,

Abhishek

eug · ‎04-29-2020

Thank you, now it works but if I compile without debug flag.

The same problem still happens if I compile with "-g flag".

AbhishekD_Intel · ‎05-05-2020

Hi,

We are also getting the same error while using -g flag and we have escalated it to the concerned team.

Soon you will get a reply from them.

Thank you for your findings.

Warm Regards,

Abhishek

Sravani_K_Intel · ‎05-06-2020

Hi Eug,

There was a known issue in the GPU driver causing SEGFAULT when using -g flag and this is fixed in its latest version.

I can no longer reproduce this issue even on DevCloud which now has the latest version of the driver.

Thanks,

Sravani