Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

OpenMP 4.0 target offload Report

Rezaul_R_
Beginner
1,589 Views

Hi ..

I am trying to make a comparison statistics of offload using,

1). Intel compiler assisted offload VS. 2). OPENMP 4.0 target construct 

My QUESTION: HOW I CAN GET OPENMP 4.0 OFFLOAD REPORT(which environment variable I need to set..?), I used OFFLOAD Report=2; intel compiler directive offload it worked fine, BUT I AM GETTING VERY STRANGE STATISTICS WITH OPENMP 4.0 OFFLOAD (I am using Intel Xeon Phi as execution platform)

Here is the code

COMPILER DIRECTIVE OFFLOAD:

// Start time
        gettimeofday(&start, NULL);

        // Run SAXPY 
        #pragma offload target(mic:0) inout(x) out(y)
        {
                        #pragma omp parallel for default (none) shared(a,x,y)
                        for (i = 0; i < n; ++i){
                                y = a*x + y;
                        }                                                        
        } // end of target data

        // end time 
        gettimeofday(&end, NULL);

OPENMP 4.0 TARGET OFFLOAD:

// Start time
        gettimeofday(&start, NULL);

        // Run SAXPY
        #pragma omp target data map(to:x)
        {
                #pragma omp target map(tofrom:y)
                {
                        #pragma omp parallel for
                        for (i = 0; i < n; ++i){
                                y = a*x + y;
                        }
                }
        } // end of target data

 

------

Thanks in advace. (Raju)

 

0 Kudos
11 Replies
Ravi_N_Intel
Employee
1,589 Views

Change your OpenMP 4.0 code to

#pragma omp target map(to:x) map(tofrom:y)
        {
                        #pragma omp parallel for
                        for (i = 0; i < n; ++i){
                                y = a*x + y;
                        }
        }

0 Kudos
Rezaul_R_
Beginner
1,589 Views

Hi Ravi,

Thanks for your reply. I executed this updated code, my finding is as below:

1. compare to COMPILER(Intel) DIRECTIVE OFFLOAD, openMP target offload perform much slower.

2. I used OFFLOAD_REPORT=2 to see the report and it shows:

[Offload] [MIC 0] [File]                    omp_target_SAXPY_only.c
[Offload] [MIC 0] [Line]                    38
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        85.259595(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   8 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        84.532258(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   8 (bytes) ; [where as Total Number of Bytes Offloaded (cpu->MIC) with compiler offload is= 2000000008 Bytes 

My input array, const int n = 500000000; float x[500000000]; float y[500000000];

 

I am wondering with OFFLOAD_REPORT=2, I am getting right underline happening in openMP 4.0..? or what else I can use for this .?

Thanks in advance again..!!

0 Kudos
Ravi_N_Intel
Employee
1,589 Views

Can you show how x and y are declared in the code for both DIRECTIVE OFFLOAD and OpenMP offload.

0 Kudos
Rezaul_R_
Beginner
1,589 Views

Yes.

Both of the cases I have declared globally, as below:

const int n = 500000000;
float x[500000000];
float y[500000000];

Thanking you,

-Raju

0 Kudos
Ravi_N_Intel
Employee
1,589 Views

The interpretation in the compiler of global variables which were not marked with #pragma omp declare target was wrong and this is fixed in the newer compiler.  

Try the following test 

#ifdef OPENMP
#pragma omp declare target
#else
#pragma offload_attribute(push, target(mic))
#endif
const int n = 5000;
float x[5000];
float y[5000];
int a = 10;
#ifdef OPENMP
#pragma omp end declare target
#else
#pragma offload_attribute(pop)
#endif

main()
{
   int i;
#ifndef OPENMP
#pragma offload target(mic:0) in(x: alloc_if(0) free_if(0)) inout(y: alloc_if(0) free_if(0))
        {
                        #pragma omp parallel for default (none) shared(a,x,y)
                        for (i = 0; i < n; ++i){
                                y = a*x + y;
                        }
        } // end of target data
        // end time
#else
        #pragma omp target map(always, to:x) map(always, tofrom:y)
                {
                        #pragma omp parallel for
                        for (i = 0; i < n; ++i){
                                y = a*x + y;
                        }
                }
#endif
}

OUTPUT I GET

icc -openmp raju.c
bash-4.2$ ./a.out
[Offload] [MIC 0] [File]                    raju.c
[Offload] [MIC 0] [Line]                    20
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        0.430813(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   40008 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.243459(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   20008 (bytes)

bash-4.2$ icc -openmp raju.c -DOPENMP
bash-4.2$ ./a.out
[Offload] [MIC 0] [File]                    raju.c
[Offload] [MIC 0] [Line]                    29
[Offload] [MIC 0] [Tag]                     Tag 0
[Offload] [HOST]  [Tag 0] [CPU Time]        0.412741(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   40004 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time]        0.234801(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   20004 (bytes)

 

0 Kudos
Rezaul_R_
Beginner
1,589 Views

Thank you very much, Ravi. It worked fine. 

Best Regards// 

Raju

0 Kudos
Rezaul_R_
Beginner
1,589 Views

Hi Ravi,

I have few confusion, 

1. I am using icc (ICC) 15.0.2, and with this version I can not compile 

        #pragma omp target map(always, to:x) map(always, tofrom:y)-- it shows 

  error: identifier "to" is undefined
  #pragma omp target map(always, to:x) map(always, tofrom:y)

2. which compiler version you are using ..?

Thanks,

- Raju 

0 Kudos
Rezaul_R_
Beginner
1,589 Views

 Thanks in advance.  And one more findings: 

1. I am trying to execute   #pragma omp target offload with array size like >[50000/ Million] [using  #pragma omp declare target ]; in this case it gives me offload error as:

    offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)

I looked into this problem and would you please comment, is it memory alignment issue ..? or Can I do increase the phi_omp_stack ..?

Appreciated your comment and help.

// Raju 

0 Kudos
Ravi_N_Intel
Employee
1,589 Views

I am using the 16.0 compiler.  The support for "always" does not exist in 15.0

You would need to split the pragma in to 3 pragmas

#pragma omp target update to(x,y) 
#pragma omp target 
{
}

#pragma omp target update from(y)

Regarding  signal 11,  if you send me the test case I can investigate the cause of the problem.

0 Kudos
Rezaul_R_
Beginner
1,589 Views

Hi Ravi,

With #pragma omp target update, I can run omp target offload without any issue. It doesn't create signal 11. 

Thank you so much for your help.

- Raju

0 Kudos
Kevin_D_Intel
Employee
1,589 Views

The compiler defect that Ravi described in post #6 has been recorded in our internal tracking system (see id noted below). We will update this thread when the fix for this issue is available in an external release.

(Internal tracking id: DPD200376293)

0 Kudos
Reply