- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am a beginner of DPC++. Recently I ran into a problem about submitting two queues parallelly on two devices.
Now I have two Intel GPUs, I want to submit my two queues to them. One queue for one GPU. So maybe I only need half original time to compute my task.
Could you give me a piece of simple example code about parallel task submission? I can not post my code online for some reasons. Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel communities.
We can use a custom device selector to run multiple device queues parallelly.
Custom Device Selector is a user-defined class, which is derived from the device selector class.
We can select any device(CPU (or) any Accelerator) using this Custom Device Selector.
Please refer to the below code snippet for more details.
You can use 2 Intel GPUs to run the program through a custom device selector as defined in the below code.
#include<CL/sycl.hpp>
#include<vector>
#include<iostream>
#include<string>
using namespace cl::sycl;
using namespace std;
static const int N = 4;
class my_selector1 : public device_selector
{
public:
int operator()(const device &dev) const
{
int score = -1;
if ( (dev.is_gpu()) && (dev.get_info<info::device::name>().find("GPU1")!= std::string::npos) )//Replace GPU1 with your available INTEL GPU
{
score += 25;
std::cout << "my_selector1 = "<< dev.get_info<info::device::name>()<<"\n" ;
}
return score;
}
};
class my_selector2 : public device_selector
{
public:
int operator()(const device &dev) const
{
int score = -1;
if ( (dev.is_gpu()) && (dev.get_info<info::device::name>().find("GPU2")== std::string::npos) )//Replace GPU2 with your available INTEL GPU
{
score += 800;
std::cout << "my_selector2 = "<< dev.get_info<info::device::name>()<<"\n" ;
}
return score;
}
};
int main()
{
auto Q1 = queue{ my_selector1{} };
int *a1 = malloc_shared<int>(N, Q1);
for(int i=0; i<N; i++) a1[i] = i;
std::cout << "Selected device: " <<Q1.get_device().get_info<info::device::name>() << "\n";
Q1.single_task([=](){
for(int i=0;i<N;i++){
a1[i] *= 2;
}
}).wait();
auto Q2 = queue{ my_selector2{} };
int *a2 = malloc_shared<int>(N, Q2);
for(int i=0; i<N; i++) a2[i] = i;
std::cout << "Selected device: " <<Q2.get_device().get_info<info::device::name>() << "\n";
Q2.single_task([=](){
for(int i=0;i<N;i++){
a2[i] *= 3;
}
}).wait();
for(int i=0; i<N; i++) std::cout << a1[i] << std::endl;
for(int i=0; i<N; i++) std::cout << a2[i] << std::endl;
free(a1, Q1);
free(a2, Q2);
return 0;
}
Thanks and Regards,
Pendyala Sesha Srinivas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. Could you please provide an update on your issue?
Thanks and Regards,
Pendyala Sesha Srinivas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We assume that your issue is resolved. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks and Regards,
Pendyala Sesha Srinivas
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page