Software Archive
Read-only legacy content
17061 Discussions

my code takes lot of time to execute and returns incorrect result

ankit_m_
Beginner
408 Views

Hello, 

I am new to programming with MIC cards. I am trying to run a very simple program but it appears that it is taking a long time to offload the data over to the MIC card and also the final output seems to be incorrect, can anyone help me figure out my mistake, please. 

#include <iostream>
#include <memory>
#include "omp.h"
#include <malloc.h>

using namespace std; 

int main()
{
    int xx=100000; 
    int yy=10000;
    
    unsigned long long size = xx*yy; 
    cout << " Simulate Data" << endl; 
    cout << "data size " << size*4 << endl; 
    
    int* aa = (int*) malloc(sizeof(int)*size); 
    for(unsigned long long ii=0; ii < xx*yy; ++ii)
    {
        aa[ii] =1; 
    }
    
    cout << " start offload " << endl; 
    unsigned long long dim = xx*yy; 
    #pragma offload target(mic:0) \
    in(aa:length(dim)) 
    {
        #pragma omp parallel for 
        for (unsigned long long ii; ii < xx*yy; ++ii)
        {
            aa[ii] *= 2; 
        }
    }
    
    cout << " offload end " << endl;
    cout << " Result  " << aa[10] <<"  " << aa[1000] << endl; 
    free(aa);  
    
    return 0;     
}

 

Thank you

Sincerely, 

AM

 

 

0 Kudos
3 Replies
jimdempseyatthecove
Honored Contributor III
408 Views

You want to use

#pragma offload target(mic:0) \
    inout(aa:length(dim)) 
   

Note, the first offload has the overhead of transferring the code and initializing the MIC's OpenMP thread pool. Try:

for(int I=0; I<4; ++I) {

cout << " start offload " << endl; 
 double t0 = omp_get_wtime(); 
  unsigned long long dim = xx*yy; 
    #pragma offload target(mic:0) \
    in(aa:length(dim)) 
    {
        #pragma omp parallel for 
        for (unsigned long long ii; ii < xx*yy; ++ii)
        {
            aa[ii] *= 2; 
        }
    }
    double t1 = omp_get_wtime();
    cout << " offload end  " << t1 - t0 << endl;
    cout << " Result  " << aa[10] <<"  " << aa[1000] << endl; 
} // for
 

Jim Dempsey

0 Kudos
ankit_m_
Beginner
408 Views

Thank you very much for your prompt reply Jim. I really appreciate all your help. Now, my program is running correctly however the offload is still too slow. 

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
408 Views

Please note that the code within your offloaded section is trivial.

Read (vector), multiply (vector), write (vector)

That is all it is doing (other than a little loop overhead)

Your offload code should be performing more work to recover the time to pass the data into and out of the MIC.

Choose something like a textbook matrix multiply as a sample code.

Jim Dempsey

0 Kudos
Reply