- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi ..
I am trying to make a comparison statistics of offload using,
1). Intel compiler assisted offload VS. 2). OPENMP 4.0 target construct
My QUESTION: HOW I CAN GET OPENMP 4.0 OFFLOAD REPORT(which environment variable I need to set..?), I used OFFLOAD Report=2; intel compiler directive offload it worked fine, BUT I AM GETTING VERY STRANGE STATISTICS WITH OPENMP 4.0 OFFLOAD (I am using Intel Xeon Phi as execution platform)
Here is the code
COMPILER DIRECTIVE OFFLOAD:
// Start time
	        gettimeofday(&start, NULL);
        // Run SAXPY 
	        #pragma offload target(mic:0) inout(x) out(y)
	        {
	                        #pragma omp parallel for default (none) shared(a,x,y)
	                        for (i = 0; i < n; ++i){
	                                y = a*x + y;
	                        }                                                        
	        } // end of target data
        // end time 
	        gettimeofday(&end, NULL);
OPENMP 4.0 TARGET OFFLOAD:
// Start time
	        gettimeofday(&start, NULL);
        // Run SAXPY
	        #pragma omp target data map(to:x)
	        {
	                #pragma omp target map(tofrom:y)
	                {
	                        #pragma omp parallel for
	                        for (i = 0; i < n; ++i){
	                                y = a*x + y;
	                        }
	                }
	        } // end of target data
------
Thanks in advace. (Raju)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Change your OpenMP 4.0 code to
#pragma omp target map(to:x) map(tofrom:y)
	        {
	                        #pragma omp parallel for
	                        for (i = 0; i < n; ++i){
	                                y = a*x + y;
	                        }
	        }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ravi,
Thanks for your reply. I executed this updated code, my finding is as below:
1. compare to COMPILER(Intel) DIRECTIVE OFFLOAD, openMP target offload perform much slower.
2. I used OFFLOAD_REPORT=2 to see the report and it shows:
[Offload] [MIC 0] [File]                    omp_target_SAXPY_only.c
	[Offload] [MIC 0] [Line]                    38
	[Offload] [MIC 0] [Tag]                     Tag 0
	[Offload] [HOST]  [Tag 0] [CPU Time]        85.259595(seconds)
	[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   8 (bytes)
	[Offload] [MIC 0] [Tag 0] [MIC Time]        84.532258(seconds)
	[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   8 (bytes) ; [where as Total Number of Bytes Offloaded (cpu->MIC) with compiler offload is= 2000000008 Bytes ] 
My input array, const int n = 500000000; float x[500000000]; float y[500000000];
I am wondering with OFFLOAD_REPORT=2, I am getting right underline happening in openMP 4.0..? or what else I can use for this .?
Thanks in advance again..!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you show how x and y are declared in the code for both DIRECTIVE OFFLOAD and OpenMP offload.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes.
Both of the cases I have declared globally, as below:
const int n = 500000000;
	float x[500000000];
	float y[500000000];
Thanking you,
-Raju
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The interpretation in the compiler of global variables which were not marked with #pragma omp declare target was wrong and this is fixed in the newer compiler.
Try the following test
#ifdef OPENMP
	#pragma omp declare target
	#else
	#pragma offload_attribute(push, target(mic))
	#endif
	const int n = 5000;
	float x[5000];
	float y[5000];
	int a = 10;
	#ifdef OPENMP
	#pragma omp end declare target
	#else
	#pragma offload_attribute(pop)
	#endif
main()
	{
	   int i;
	#ifndef OPENMP
	#pragma offload target(mic:0) in(x: alloc_if(0) free_if(0)) inout(y: alloc_if(0) free_if(0))
	        {
	                        #pragma omp parallel for default (none) shared(a,x,y)
	                        for (i = 0; i < n; ++i){
	                                y = a*x + y;
	                        }
	        } // end of target data
	        // end time
	#else
	        #pragma omp target map(always, to:x) map(always, tofrom:y)
	                {
	                        #pragma omp parallel for
	                        for (i = 0; i < n; ++i){
	                                y = a*x + y;
	                        }
	                }
	#endif
	}
OUTPUT I GET
icc -openmp raju.c
	bash-4.2$ ./a.out
	[Offload] [MIC 0] [File]                    raju.c
	[Offload] [MIC 0] [Line]                    20
	[Offload] [MIC 0] [Tag]                     Tag 0
	[Offload] [HOST]  [Tag 0] [CPU Time]        0.430813(seconds)
	[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   40008 (bytes)
	[Offload] [MIC 0] [Tag 0] [MIC Time]        0.243459(seconds)
	[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   20008 (bytes)
bash-4.2$ icc -openmp raju.c -DOPENMP
	bash-4.2$ ./a.out
	[Offload] [MIC 0] [File]                    raju.c
	[Offload] [MIC 0] [Line]                    29
	[Offload] [MIC 0] [Tag]                     Tag 0
	[Offload] [HOST]  [Tag 0] [CPU Time]        0.412741(seconds)
	[Offload] [MIC 0] [Tag 0] [CPU->MIC Data]   40004 (bytes)
	[Offload] [MIC 0] [Tag 0] [MIC Time]        0.234801(seconds)
	[Offload] [MIC 0] [Tag 0] [MIC->CPU Data]   20004 (bytes)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much, Ravi. It worked fine.
Best Regards//
Raju
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ravi,
I have few confusion,
1. I am using icc (ICC) 15.0.2, and with this version I can not compile
#pragma omp target map(always, to:x) map(always, tofrom:y)-- it shows
  error: identifier "to" is undefined
	  #pragma omp target map(always, to:x) map(always, tofrom:y)
2. which compiler version you are using ..?
Thanks,
- Raju
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks in advance. And one more findings:
1. I am trying to execute #pragma omp target offload with array size like >[50000/ Million] [using #pragma omp declare target ]; in this case it gives me offload error as:
offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)
I looked into this problem and would you please comment, is it memory alignment issue ..? or Can I do increase the phi_omp_stack ..?
Appreciated your comment and help.
// Raju
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using the 16.0 compiler. The support for "always" does not exist in 15.0
You would need to split the pragma in to 3 pragmas
#pragma omp target update to(x,y) 
	#pragma omp target 
	{
	}
	#pragma omp target update from(y)
Regarding signal 11, if you send me the test case I can investigate the cause of the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ravi,
With #pragma omp target update, I can run omp target offload without any issue. It doesn't create signal 11.
Thank you so much for your help.
- Raju
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler defect that Ravi described in post #6 has been recorded in our internal tracking system (see id noted below). We will update this thread when the fix for this issue is available in an external release.
(Internal tracking id: DPD200376293)
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page