How can I submit two queues parallelly in DPC++?

tanzl_ustc · ‎08-03-2022

Hello, I am a beginner of DPC++. Recently I ran into a problem about submitting two queues parallelly on two devices.

Now I have two Intel GPUs, I want to submit my two queues to them. One queue for one GPU. So maybe I only need half original time to compute my task.

Could you give me a piece of simple example code about parallel task submission? I can not post my code online for some reasons. Thanks!

SeshaP_Intel · ‎08-05-2022

Hi,

Thanks for posting in Intel communities.

We can use a custom device selector to run multiple device queues parallelly.

Custom Device Selector is a user-defined class, which is derived from the device selector class.

We can select any device(CPU (or) any Accelerator) using this Custom Device Selector.

Please refer to the below code snippet for more details.

You can use 2 Intel GPUs to run the program through a custom device selector as defined in the below code.

#include<CL/sycl.hpp>
#include<vector>
#include<iostream>
#include<string>
using namespace cl::sycl;
using namespace std;
static const int N = 4;
class my_selector1 : public device_selector
{
public:
int operator()(const device &dev) const
{
int score = -1;
if ( (dev.is_gpu()) && (dev.get_info<info::device::name>().find("GPU1")!= std::string::npos) )//Replace GPU1 with your available INTEL GPU
{
score += 25;
std::cout << "my_selector1 = "<< dev.get_info<info::device::name>()<<"\n" ;
}

return score;
}
};
class my_selector2 : public device_selector
{
public:
int operator()(const device &dev) const
{
int score = -1;
if ( (dev.is_gpu()) && (dev.get_info<info::device::name>().find("GPU2")== std::string::npos) )//Replace GPU2 with your available INTEL GPU
{
score += 800;
std::cout << "my_selector2 = "<< dev.get_info<info::device::name>()<<"\n" ;
}
return score;
}
};

int main()
{
auto Q1 = queue{ my_selector1{} };
int *a1 = malloc_shared<int>(N, Q1);
for(int i=0; i<N; i++) a1[i] = i;
std::cout << "Selected device: " <<Q1.get_device().get_info<info::device::name>() << "\n";
Q1.single_task([=](){
    for(int i=0;i<N;i++){
      a1[i] *= 2;
    }
  }).wait();

auto Q2 = queue{ my_selector2{} };
int *a2 = malloc_shared<int>(N, Q2);
for(int i=0; i<N; i++) a2[i] = i;
std::cout << "Selected device: " <<Q2.get_device().get_info<info::device::name>() << "\n";
Q2.single_task([=](){
    for(int i=0;i<N;i++){
      a2[i] *= 3;
    }
  }).wait();

for(int i=0; i<N; i++) std::cout << a1[i] << std::endl;
for(int i=0; i<N; i++) std::cout << a2[i] << std::endl;
free(a1, Q1);
free(a2, Q2);

return 0;
}

Thanks and Regards,

Pendyala Sesha Srinivas

SeshaP_Intel · ‎08-11-2022

Hi,

We haven't heard back from you. Could you please provide an update on your issue?

Thanks and Regards,

Pendyala Sesha Srinivas

SeshaP_Intel · ‎08-19-2022

Hi,

We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Thanks and Regards,

Pendyala Sesha Srinivas