Software Archive
Read-only legacy content
17061 Discussions

May I use multiple MIC to optimize one loop?

Xero_E_
Beginner
385 Views

I have heard an approach such as :

#progma parallel for private(i,j)
for (int i=0;i<100;++i)
{
#progma offload target(mic:i)
#progma parallel for private(j)
for (int j=0;j<100;++j)
...
}

but in case of one loop, I can only write out such silly code :

#progma parallel for private(i,j)
for (int i=0;i<3;++i) // Well, I have 3 mic cards
{
#progma offload target(mic:i)
#progma parallel for private(j)
for (int j=(n/3)*i;j<(n/3)*(i+1);++j) // n is total job count.
..... // do the job from (n/3)*i to (n/3)*(i+1)-1
}

It's not only ugly but also silly. But I can't find out the elegant method. =...=

0 Kudos
1 Reply
Loc_N_Intel
Employee
385 Views

Hi Xero,

I tried something simple for offloading to three coprocessors:

omp_set_num_threads(128); // The maximum number of host threads doing offload

#pragma omp parrallel for private (j)
for j=0; j<n; j++)
{
#pragma offload target(mic:j)
// do the job for each chunk j
}

Also, you can use MPI to divide the total work among three coprocessors, each coprocessor uses OpenMP to run its shared workload in parallel.

0 Kudos
Reply