Not easy to get the changelog

Paulius_V_ · ‎07-30-2015

Hello. Ever since I tried switching over to openMP 4.0 from LEO my code was not working properly. Finally figured out that during the offload transfer the map clause copies arrays but DOES NOT COPY SCALARS.

The only way to get scalars copied to the device is to call target update to(scalar)

Can someone please explain this behaviour?

Using icc 15.0.1

EDIT: Scalars created in the same file are copied, but scalars imported from a different file using USE do not get copied. Arrays get copied no matter where they originate from.

Also, what's the difference between

!$omp target data map(to:x)
!$omp target
call math
!$omp end target
!$omp end target data

!$omp target map(to:x)
call math
!$omp end target

Program with the second method is that I cannot call !$omp target update from within the update region, which is the only way that I've found to copy scalars from external files.

Thanks!

Kevin_D_Intel · ‎07-30-2015

If you have or could create a small reproducer that demonstrates the issue that would be helpful. In the meantime, I will try some tests regarding the global scalars in Fortran.

Regarding the question about the difference between the constructs, it is my understanding as written there really is no difference. The first form allows for inclusion of additional host code (including target update) appearing inside the scope of the target data but outside the target construct.

Paulius_V_ · ‎07-30-2015

Hi Kevin. I am using ifort, sorry for the confusion. I will make a test case for you within the next 4 hours or so.

Regarding the use of target data directives: Say I have two main loops I want to offload. They both share a lot of the of the same data and I want to implement data persistence. I should have a target data around both of those loops , target directives to offload the parallel sections and have a target update directive in between the two loops so that the second offload would have freshly computed values, correct? Then, at the end of the second offload I will be exiting the offload target region and the values will be copied back to the host according to the mapping rules, correct?

If so, then theoretically I could have one massive target data region around my entire code and use update calls to synchronize the data after each target call? Would that create less overhead than having multiple data regions throughout the code?

Ravi_N_Intel · ‎07-30-2015

Regarding the use of target data directives: Say I have two main loops I want to offload. They both share a lot of the of the same data and I want to implement data persistence. I should have a target data around both of those loops , target directives to offload the parallel sections and have a target update directive in between the two loops so that the second offload would have freshly computed values, correct? Then, at the end of the second offload I will be exiting the offload target region and the values will be copied back to the host according to the mapping rules, correct?
- Yes

If so, then theoretically I could have one massive target data region around my entire code and use update calls to synchronize the data after each target call?
- Yes. You could also use omp declare target to declare the variables you want persistent for the entire program and use update to synchronize instead of have one massive target data region.

Would that create less overhead than having multiple data regions throughout the code?
Yes. Allocating/Deallocating is an expensive process.
On the flip side you are holding up that much memory even when you are not using it and prevent some other program from running out of memory or your own program which does an offload and does not use that memory but needs memory allocated for a different variable. If you think you are not going to encounter this issue, yes recommend method is to allocate/deallocate memory fewest times.

Paulius_V_ · ‎07-30-2015

Ravi, I already use the omp declare target for marking arrays/scalars imported from an external file as they won't compile to an offload region otherwise. How come I don't have to mark arrays created in the same file with the declare target?

What's the LEO parallel here? attribute push/pop?

Thanks

Ravi_N_Intel · ‎07-30-2015

Can you provide the test case with the problem. I am having difficultly trying to picture the scenario.

push/pop equivalent is

#pragma omp declare target new-line
arrays/scalars
#pragma omp end declare target new-line

Paulius_V_ · ‎07-31-2015

Sorry for the delay Kevin. I've been trying to replicate the error but it seems that everything works if I call a program which uses a module in which I have a scalar and an array with offload declare target attributes.

In my code, however there's a few layers.

My main program uses a module from which it calls a subroutine

which calls another subroutine which is located in another file

which calls another subroutine located in the same file

which uses a module which contains the variables in question. These variables are then offloaded but the scalars won't pull the value from the host unless you explicitly do so using target update.

jimdempseyatthecove · ‎07-31-2015

>> These variables are then offloaded but the scalars won't pull the value from the host unless you explicitly do so using target update.

Have you considered the implications of automatically doing this should you want to have persistent data in the MIC between offloads?
(IOW then this would require an explicit NOCOPY on those variables at each offload).

IF (stress "if") you relate offloads to OpenMP parallel regions, then your module data, visible to both domains are in fact in different memory domains (one on host and one on MIC) and therefore are equivalent to PRIVATE (not SHARED), and thus require the equivalent to FIRSTPRIVATE and/or LASTPRIVATE on the directive that enters the other region.

Jim Dempsey

Paulius_V_ · ‎08-01-2015

Jim,

I did have data persistence in the LEO implementation and I will have it in openMP implementation as well. I also have multithreading and everything works perfectly. The only problem I'm experiencing is concerned with offload with openMP. I'm just trying to understand why it won't copy the scalars. I plan on updating everything manually in the final implementation.

I'm not talking about any multithreading right now at all as the test case I was using was a simple write of an array value and a scalar value to stdout from the host, followed by same thing from the MIC.

Perhaps I'm misunderstanding your post?

Thanks

jimdempseyatthecove · ‎08-01-2015

Paulius,

code
some directive (OpenMP and/or Offload)
some enclosed region in the other domain
end of scope of directive
code

When "some enclosed region in the other domain" is an OpenMP parallel region on the same CPU(s) of where the "code" lives, then it is possible to have "shared" variables actually share the same memory locations. Note, "same CPU(s)" means both domains are on Host .OR. both domains are on Xeon Phi.

However, when "some enclosed region in the other domain" is an offload region on different CPU(s) of where the "code" lives, then it is not possible to have "shared" variables actually share the same memory locations. This will require a directive clause to copy the data before and/or after the scope of the enclosed region (possibly excepting for persistent data that lives in the other domain). Note, "different CPU(s)" means each domain are .NOT.(both domains are on Host .OR. both domains are on Xeon Phi). IOW other domain on remote "CPU(s)" from the perspective of the enclosing scope.

The compiler can make an assumption that stack local variables that are declared and initialized in the outer scope domain and are used within "the other domain" prior to initialization in "the other domain" implicitly require a copy operation (before and after other domain). This is due to the locally defined variables are known not to be used externally from the enclosing scope.

In the case of module variables, these variables are persistent, and may have (require) different values in the different domains.

Now, let's give you a hack (somewhat elegant solutions) that may make life easier for you should you have a large number of scalars to transport back and forth between regions.

One way is to use a UNION and MAP to map an array to the list of scalars required to be copied. You can then specify this array to be copied in/out on the offload. Use two such unions if what you copy in is a different collection of scalars than what you copy out.

Jim Dempsey

Paulius_V_ · ‎08-02-2015

Jim, I removed all multithreading to diagnose this offload issue. I will look into your suggestions once I have have single core offload working with omp. Thanks for the info.

Here's some more weird stuff: my first offload region words with the explicit update. My 2nd , bigger, offload region is causing a seg fault after exiting. I tried copying the main array to and from the device and it seems to copy to the device no matter the direction (to/from) I pick. Attached is the code snippet and the offload report.

Any ideas?

EDIT: So I just manually mapped this array to just be allocated on the mic and it seems to work now. Now I'm just very confused on when do the target data and target directives automatically allocate and copy data for you when nothing is specified.

I read on the openMP 4.0 spec http://openmp.org/mp-documents/OpenMP-4.0-Fortran.pdf that tofrom is default behaviour, however, I noticed that is not the case as the compiler seems to be smart enough to set read only variables to be "to" only and others to "out". From this I assumed that in any case a target data region will allocate all of the variables it encounters within the region. Now it seems that I have to manually allocate every single variable inside the target data region?

So perhaps it's only the target region, not target data regions, that automatically allocate and copy any unspecified vars. The target data directive does nothing for you other than initialize the mic environment?

Ravi_N_Intel · ‎08-03-2015

OpenMP model is different from LEO in the sense that if a scalar/array already exists on the device either through omp declare target or omp target data at the outer scope, using to/from clause on a target pragma for these variables has not effect. Need to use update to send/receive data. OpenMP4.1 has introduced a new qualifier "always" to be using with to/from which would force data motion.

Paulius_V_ · ‎08-03-2015

Ravi, using the 2016 beta compiler I noticed that it is no longer necessary to mark variables created in a module with a declare target directive. The code compiles and seems to move most of the data except for scalars again, just like 2015 version.

Could you link me to the changelog for 2016 fortran compiler? I'm not seeing the changes to no longer requiring omp declare target documented anywhere.

Ravi_N_Intel · ‎08-03-2015

Not easy to get the changelog.
If a variable is marked with omp declare target then the data is persistent at the global scope. If you don't then it is allocated and free at every outermost omp target it is specified in. Best if you provide us with an example.
If you cannot share the code here, maybe you could share the code privately with Kevin who has responded in this mail chain.

Paulius_V_ · ‎08-04-2015

Hi all,

I was able to resolve all of my issues.

note: 2016 beta fortran compiler does not give you warnings when you don't mark vars with !$omp declare target(var)

Important to make sure all the vars inside the !$omp target data region are marked with that attribute. I would recommend doing everything manually - allocation, and movement of all the data using target update directives.

!$omp target data
!$omp& map(alloc:var1,var2)

!$omp target update to(var1)

!$omp target
var2 = math1(var1) !on mic
!$omp end target

!$omp target update from(var2)
var2 = math2(var2) !on host
!$omp target update to(var2)

!$omp target
var2 = math3(var2) !on mic
!$omp end target

!$omp target update from(var2)

!$omp end target data

Thanks for all your help.

OpenMP 4.0 Fortran -> $omp target map(to:x) Does not copy scalars