OpenMP 4.0 target offload Report

Rezaul_R_ · ‎09-14-2015

Hi ..

I am trying to make a comparison statistics of offload using,

1). Intel compiler assisted offload VS. 2). OPENMP 4.0 target construct

My QUESTION: HOW I CAN GET OPENMP 4.0 OFFLOAD REPORT(which environment variable I need to set..?), I used OFFLOAD Report=2; intel compiler directive offload it worked fine, BUT I AM GETTING VERY STRANGE STATISTICS WITH OPENMP 4.0 OFFLOAD (I am using Intel Xeon Phi as execution platform)

Here is the code

COMPILER DIRECTIVE OFFLOAD:

// Start time
gettimeofday(&start, NULL);

// Run SAXPY
#pragma offload target(mic:0) inout(x) out(y)
{
#pragma omp parallel for default (none) shared(a,x,y)
for (i = 0; i < n; ++i){
y = a*x + y;
}
} // end of target data

// end time
gettimeofday(&end, NULL);

OPENMP 4.0 TARGET OFFLOAD:

// Start time
gettimeofday(&start, NULL);

// Run SAXPY
#pragma omp target data map(to:x)
{
#pragma omp target map(tofrom:y)
{
#pragma omp parallel for
for (i = 0; i < n; ++i){
y = a*x + y;
}
}
} // end of target data

------

Thanks in advace. (Raju)

Ravi_N_Intel · ‎09-14-2015

Change your OpenMP 4.0 code to

#pragma omp target map(to:x) map(tofrom:y)
{
#pragma omp parallel for
for (i = 0; i < n; ++i){
y = a*x + y;
}
}

Rezaul_R_ · ‎09-14-2015

Hi Ravi,

Thanks for your reply. I executed this updated code, my finding is as below:

1. compare to COMPILER(Intel) DIRECTIVE OFFLOAD, openMP target offload perform much slower.

2. I used OFFLOAD_REPORT=2 to see the report and it shows:

[Offload] [MIC 0] [File] omp_target_SAXPY_only.c
[Offload] [MIC 0] [Line] 38
[Offload] [MIC 0] [Tag] Tag 0
[Offload] [HOST] [Tag 0] [CPU Time] 85.259595(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data] 8 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time] 84.532258(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data] 8 (bytes) ; [where as Total Number of Bytes Offloaded (cpu->MIC) with compiler offload is= 2000000008 Bytes ]

My input array, const int n = 500000000; float x[500000000]; float y[500000000];

I am wondering with OFFLOAD_REPORT=2, I am getting right underline happening in openMP 4.0..? or what else I can use for this .?

Thanks in advance again..!!

Ravi_N_Intel · ‎09-14-2015

Can you show how x and y are declared in the code for both DIRECTIVE OFFLOAD and OpenMP offload.

Rezaul_R_ · ‎09-14-2015

Yes.

Both of the cases I have declared globally, as below:

const int n = 500000000;
float x[500000000];
float y[500000000];

Thanking you,

-Raju

Ravi_N_Intel · ‎09-14-2015

The interpretation in the compiler of global variables which were not marked with #pragma omp declare target was wrong and this is fixed in the newer compiler.

Try the following test

#ifdef OPENMP
#pragma omp declare target
#else
#pragma offload_attribute(push, target(mic))
#endif
const int n = 5000;
float x[5000];
float y[5000];
int a = 10;
#ifdef OPENMP
#pragma omp end declare target
#else
#pragma offload_attribute(pop)
#endif

main()
{
int i;
#ifndef OPENMP
#pragma offload target(mic:0) in(x: alloc_if(0) free_if(0)) inout(y: alloc_if(0) free_if(0))
{
#pragma omp parallel for default (none) shared(a,x,y)
for (i = 0; i < n; ++i){
y = a*x + y;
}
} // end of target data
// end time
#else
#pragma omp target map(always, to:x) map(always, tofrom:y)
{
#pragma omp parallel for
for (i = 0; i < n; ++i){
y = a*x + y;
}
}
#endif
}

OUTPUT I GET

icc -openmp raju.c
bash-4.2$ ./a.out
[Offload] [MIC 0] [File] raju.c
[Offload] [MIC 0] [Line] 20
[Offload] [MIC 0] [Tag] Tag 0
[Offload] [HOST] [Tag 0] [CPU Time] 0.430813(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data] 40008 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time] 0.243459(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data] 20008 (bytes)

bash-4.2$ icc -openmp raju.c -DOPENMP
bash-4.2$ ./a.out
[Offload] [MIC 0] [File] raju.c
[Offload] [MIC 0] [Line] 29
[Offload] [MIC 0] [Tag] Tag 0
[Offload] [HOST] [Tag 0] [CPU Time] 0.412741(seconds)
[Offload] [MIC 0] [Tag 0] [CPU->MIC Data] 40004 (bytes)
[Offload] [MIC 0] [Tag 0] [MIC Time] 0.234801(seconds)
[Offload] [MIC 0] [Tag 0] [MIC->CPU Data] 20004 (bytes)

Rezaul_R_ · ‎09-15-2015

Thank you very much, Ravi. It worked fine.

Best Regards//

Raju

Rezaul_R_ · ‎09-16-2015

Hi Ravi,

I have few confusion,

1. I am using icc (ICC) 15.0.2, and with this version I can not compile

#pragma omp target map(always, to:x) map(always, tofrom:y)-- it shows

error: identifier "to" is undefined
#pragma omp target map(always, to:x) map(always, tofrom:y)

2. which compiler version you are using ..?

Thanks,

- Raju

Rezaul_R_ · ‎09-16-2015

Thanks in advance. And one more findings:

1. I am trying to execute #pragma omp target offload with array size like >[50000/ Million] [using #pragma omp declare target ]; in this case it gives me offload error as:

offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)

I looked into this problem and would you please comment, is it memory alignment issue ..? or Can I do increase the phi_omp_stack ..?

Appreciated your comment and help.

// Raju

Ravi_N_Intel · ‎09-17-2015

I am using the 16.0 compiler. The support for "always" does not exist in 15.0

You would need to split the pragma in to 3 pragmas

#pragma omp target update to(x,y)
#pragma omp target
{
}
#pragma omp target update from(y)

Regarding signal 11, if you send me the test case I can investigate the cause of the problem.

Rezaul_R_ · ‎09-17-2015

Hi Ravi,

With #pragma omp target update, I can run omp target offload without any issue. It doesn't create signal 11.

Thank you so much for your help.

- Raju

Kevin_D_Intel · ‎09-19-2015

The compiler defect that Ravi described in post #6 has been recorded in our internal tracking system (see id noted below). We will update this thread when the fix for this issue is available in an external release.

(Internal tracking id: DPD200376293)