KNC OMP4.0 fortran offload - subroutine opt args, data serialization, transfer costs?

Paulius_V_ · ‎10-01-2016

Hello all. I'm doing some testing on OpenMP target offload model and have a few questions:

In the code attached, what's causing the serialization warnings in regards to f128 data types?

I'm trying to set up optional arguments for a subroutine that's declared to be offloaded. I'm not passing any arguments yet, and if I do a check to see if the optional arg was present I seg fault. I'm guessing this has something to do with the argument not being allocated on the MIC? Anyone experience anything similar?

I'm also trying to measure the offload cost relative to the total execution time. Before I put all of my math into a subroutine I used omp_get_wtime(). If I try to do the same from a subroutine that's offloaded the comiler (2017) complains that it's not !$omp declare target'ed. Can I get the same information from OFFLOAD_REPORT=3? As in, does anyone have any insight in exactly what's measured under host time and under mic time ?

If anyone ends up looking at my test code any review or comments would be greatly appreciated.

Many thanks.

Kevin_D_Intel · ‎10-03-2016

I was unable to find more info on the serialization messages so I’ll inquire with Developers.

There does appear to be an issue with the optional arguments. I reproduced the seg-fault when calling calc() without those; runs successfully with them. I’ll inquire with Developers about this.

I can call like omp_get_wtime() inside calc() when adding USE OMP_LIB ahead of the !$omp declare target inside calc(). In the absence of this, I see the warning below which I believe is what you’re indicating you experienced.

warning #8694: *MIC* A procedure called by a procedure with the DECLARE TARGET attribute must have the DECLARE TARGET attribute. [OMP_GET_WTIME]
t1=omp_get_wtime()
---^

Offload report and the times are described here, https://software.intel.com/en-us/node/680162. There’s a similar discussion in the earlier thread https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/675751.

Paulius_V_ · ‎10-03-2016

Kevin D. (Intel) wrote:

I was unable to find more info on the serialization messages so I’ll inquire with Developers.

There does appear to be an issue with the optional arguments. I reproduced the seg-fault when calling calc() without those; runs successfully with them. I’ll inquire with Developers about this.

I can call like omp_get_wtime() inside calc() when adding USE OMP_LIB ahead of the !$omp declare target inside calc(). In the absence of this, I see the warning below which I believe is what you’re indicating you experienced.

warning #8694: *MIC* A procedure called by a procedure with the DECLARE TARGET attribute must have the DECLARE TARGET attribute. [OMP_GET_WTIME]
t1=omp_get_wtime()
---^

Offload report and the times are described here, https://software.intel.com/en-us/node/680162. There’s a similar discussion in the earlier thread https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/675751.

Not sure if this is worth it's own thread but another issue I've encountered is this code getting killed on MIC side signal 11 when using OFFLOAD_INIT=on_startup. When using on_offload it runs fine, however. Should I start a separate discussion or do you have any insight into this issue?

thank you very much!

Paulius_V_ · ‎10-18-2016

Just an update, I rewrote this in LEO and ALL of my problems have disappeared. It seems that omp4.0 for fortran is still very buggy.

Kevin_D_Intel · ‎10-19-2016

Thank you the update. I received the following guidance regarding the warning #15519: A part of code was serialized due to operation on real128

ifort uses binary128 format to represent handle real(16) datatype: https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format

In general, this is handled by calls to routines of the form __addq(…) or __mulq(..) to do a simple addition or a multiply involving variables declared with that datatype. (You can check the generated asm, it will have such calls - that probably get handled by the Fortran RTL).

Because of this specialized handling, a lot of “regular” optimizations get skipped - including native vectorization of real(16) datatype - even with simd directives. Compiler may vectorize with “simd”, but it is essentially serializing the computation.

Use of real128 datatype is the reason - as suggested by the remarks. Not sure if the user really requires the extra precision for their apps (instead of just real(8)).

I’m still pursuing the optional argument and warning #8694 and will let you know what I learn.