- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am new to programming with MIC cards. I am trying to run a very simple program but it appears that it is taking a long time to offload the data over to the MIC card and also the final output seems to be incorrect, can anyone help me figure out my mistake, please.
#include <iostream>
#include <memory>
#include "omp.h"
#include <malloc.h>
using namespace std;
int main()
{
int xx=100000;
int yy=10000;
unsigned long long size = xx*yy;
cout << " Simulate Data" << endl;
cout << "data size " << size*4 << endl;
int* aa = (int*) malloc(sizeof(int)*size);
for(unsigned long long ii=0; ii < xx*yy; ++ii)
{
aa[ii] =1;
}
cout << " start offload " << endl;
unsigned long long dim = xx*yy;
#pragma offload target(mic:0) \
in(aa:length(dim))
{
#pragma omp parallel for
for (unsigned long long ii; ii < xx*yy; ++ii)
{
aa[ii] *= 2;
}
}
cout << " offload end " << endl;
cout << " Result " << aa[10] <<" " << aa[1000] << endl;
free(aa);
return 0;
}
Thank you
Sincerely,
AM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You want to use
#pragma offload target(mic:0) \
inout(aa:length(dim))
Note, the first offload has the overhead of transferring the code and initializing the MIC's OpenMP thread pool. Try:
for(int I=0; I<4; ++I) {
cout << " start offload " << endl;
double t0 = omp_get_wtime();
unsigned long long dim = xx*yy;
#pragma offload target(mic:0) \
in(aa:length(dim))
{
#pragma omp parallel for
for (unsigned long long ii; ii < xx*yy; ++ii)
{
aa[ii] *= 2;
}
}
double t1 = omp_get_wtime();
cout << " offload end " << t1 - t0 << endl;
cout << " Result " << aa[10] <<" " << aa[1000] << endl;
} // for
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your prompt reply Jim. I really appreciate all your help. Now, my program is running correctly however the offload is still too slow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please note that the code within your offloaded section is trivial.
Read (vector), multiply (vector), write (vector)
That is all it is doing (other than a little loop overhead)
Your offload code should be performing more work to recover the time to pass the data into and out of the MIC.
Choose something like a textbook matrix multiply as a sample code.
Jim Dempsey
![](/skins/images/7FC17B7B85029576C25F1E43CE255B51/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page