- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have heard an approach such as :
#progma parallel for private(i,j)
for (int i=0;i<100;++i)
{
#progma offload target(mic:i)
#progma parallel for private(j)
for (int j=0;j<100;++j)
...
}
but in case of one loop, I can only write out such silly code :
#progma parallel for private(i,j)
for (int i=0;i<3;++i) // Well, I have 3 mic cards
{
#progma offload target(mic:i)
#progma parallel for private(j)
for (int j=(n/3)*i;j<(n/3)*(i+1);++j) // n is total job count.
..... // do the job from (n/3)*i to (n/3)*(i+1)-1
}
It's not only ugly but also silly. But I can't find out the elegant method. =...=
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xero,
I tried something simple for offloading to three coprocessors:
omp_set_num_threads(128); // The maximum number of host threads doing offload
#pragma omp parrallel for private (j)
for j=0; j<n; j++)
{
#pragma offload target(mic:j)
// do the job for each chunk j
}
Also, you can use MPI to divide the total work among three coprocessors, each coprocessor uses OpenMP to run its shared workload in parallel.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page