- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have heard an approach such as :
#progma parallel for private(i,j) for (int i=0;i<100;++i) { #progma offload target(mic:i) #progma parallel for private(j) for (int j=0;j<100;++j) ... }
but in case of one loop, I can only write out such silly code :
#progma parallel for private(i,j) for (int i=0;i<3;++i) // Well, I have 3 mic cards { #progma offload target(mic:i) #progma parallel for private(j) for (int j=(n/3)*i;j<(n/3)*(i+1);++j) // n is total job count. ..... // do the job from (n/3)*i to (n/3)*(i+1)-1 }
It's not only ugly but also silly. But I can't find out the elegant method. =...=
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xero,
I tried something simple for offloading to three coprocessors:
omp_set_num_threads(128); // The maximum number of host threads doing offload #pragma omp parrallel for private (j) for j=0; j<n; j++) { #pragma offload target(mic:j) // do the job for each chunk j }
Also, you can use MPI to divide the total work among three coprocessors, each coprocessor uses OpenMP to run its shared workload in parallel.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page