Software Archive
Read-only legacy content
17060 Discussions

target map vs. target update (Fortran)

TimP
Honored Contributor III
598 Views

I've resumed experimenting on how these 2 methods of copying data to and from MIC differ, using ifort 14.0.1 linux.

!$omp target update   appears to work correctly in all my examples, with one exception:

When I have an if() clause which is not satisfied, and a target region includes parallel do reduction(max:...) lastprivate....

it is failing to give correct results, although the same case is correct when the if() is satisfied (same if condition on the target update and target sections) and when no target directives are present.  This is the only case of 13 where the if clause appears to cause trouble, but it doesn't have a problem when using target map.

 

Although target update has no apparent difficulty copying data to and from a common block marked with $!omp target (/cdata/), target map doesn't  transfer data correctly in a common block but it is fine otherwise.  I have 3 different cases using data from common; all exhibit this behavior.

Are these behaviors according to standard or design, and are they documented?  Or do they fall in the category of things which may be considered for implementation next year?

I expected more differences in performance between target update and target map than I have been able to demonstrate.  I was also surprised that their failures don't occur on the same cases (as I might expect due to my lack of familiarity with OpenMP 4).

In the area of building a procedure to be called according to !$omp target (subname), I've only succeeded in provoking the compiler to internal error.  I didn't find any documented examples.  I filed a bug ticket on this, since the internal error report hints that is a suitable action.

The behaviors were identical on different hardware platforms (Westmere and Ivytown host, KNC B0 and C0, mpss 2.1 and 3.1).

0 Kudos
16 Replies
Kevin_D_Intel
Employee
598 Views

What you described sound like functional/implementation defects but I'm still learning about the new OpenMP 4.0 features so I could easily be wrong.

Could you provide reproducers of those issues that you outlined so that myself/Development could dig deeper into these?

We currently only have the OpenMP 4.0 directives/pragmas documented. You probably already read in this thread about the OpenMP Examples .pdf.

0 Kudos
TimP
Honored Contributor III
598 Views

Attempting to submit via IPS "business portal" interface, got error code after uploading files, so don't know whether it took.

0 Kudos
Bernard
Valued Contributor I
598 Views

@Tim Prince

Did you reset your account?

0 Kudos
TimP
Honored Contributor III
598 Views

I'm retiring from Intel so had to resurrect my old account.  Dmitri O. has submitted tickets to have my entries transferred, but it seems the site isn't set up to support that, although there have been cases where outside customer accounts were combined.   Dmitri has assured me he will take care of continuing my Black Belt status, but this has been more difficult than expected.  Apparently, I was correct in thinking a head start would be required. 

Bug report submissions seem easier with the outside customer interface than the one which was unexpectedly given us internal to Intel while everyone was otherwise occupied last August, but it's evidently not trouble-free.  At least it supports submission from a linux platform and from outside Intel firewall, both facilities having been taken away from employees prior to my retirement.  The issue I submitted yesterday on the present subject doesn't appear in my account view.

0 Kudos
TimP
Honored Contributor III
598 Views

IPS 6000037150

0 Kudos
Xiaoping_D_Intel
Employee
598 Views

The behavior is correct accordig to OpenMP 4.0 standard. At "map" clause description (2.14.5:17) it is written:

"If a corresponding list item of the original list item is in the enclosing device data environment, the new device data environment uses the corresponding list item from the enclosing device data environment. No additional storage is allocated in the new device data environment and neither initialization nor assignment is performed, regardless of the map-type that is specified."

Global data are treated as pragma data for the  to the whole program so it will be ignored in the map list.
 

Thanks,

Xiaoping

0 Kudos
TimP
Honored Contributor III
598 Views

I suppose I'm not a competent language lawyer.  I can't read into this that target update is the only method to synchronize data in common, even if that was the intention of this paragraph.  I think OpenMP has gone beyond the past claim that the standards doc should be all that is needed by developers.

0 Kudos
jimdempseyatthecove
Honored Contributor III
598 Views

With respect to this area of the standards, I think it would have been beneficial had they included a code example together with context and dataflow diagrams.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
598 Views

The issue was closed without explaining the question about if clause not working with target update reduction when it works with target map.

Some of us would like to develop more detailed discussions of OpenMP 4 without waiting for a gcc implementation of omp target which could be compared with the ifort and icc ones.

0 Kudos
pbkenned1
Employee
598 Views

Hi Tim,

I think the s3110 loop should correctly execute on the host for the short vector length (n .eq. 10), so this looks like an ifort defect, perhaps a combination defect with the max reduction:

!$omp target update to( aa,n,max_,xindex,yindex) if(n>990)
      do nl= 1,ntimes/n
          max_= aa(1,1)
!$omp target if(n>990)
!$omp parallel do reduction(max: max_)
!$omp& lastprivate(xindex,yindex)
!$omp& firstprivate(xindex,yindex)
          do j=1,n
              ml= maxloc(aa(:n,j),dim=1)
              if(aa(ml,j)>max_ .or. aa(ml,j)==max_ .and. j<yindex)then
                  xindex= ml
                  yindex= j
                  max_=aa(ml,j)
                endif
            enddo
!$omp end target

I extracted the s3110 loop, and as you say, it works fine for the 'long' vector lengths:

$ ifort -openmp maind_s3110.F loopdoff2_s3110.F -o loopdoff2_s3110.x
$ ./loopdoff2_s3110.x

 Loop    VL     Seconds     Checksum      PreComputed  Residual(1.e-10)   No.
 s3110  100     0.220412     2.000000000E+00    2.0200E+02   1.0000E+02          80
 s3110 1000     9.863861    2.0020E+03    2.0020E+03                       80
 s3110 2000     0.119138    4.0020E+03    4.0020E+03                       80

 

I constructed a simple test case with 'target update to() from() if(something)' logic, and it executed correctly on my 32-thread IVB host:

...

    a = 2    !HOST does this
    on_target = .false.
    i = 0

!$omp target update from(a) to(in2) if (i .ne. 0)
!$omp target if (i .ne. 0)
!$omp parallel
    a = 3
!$omp master
   in2 = omp_get_num_threads()

   if(in2 .gt. 200) then
      print *,' a = 3 executed on target using',in2,' threads'
      on_target = .true.
   else
      print *,' a = 3 executed on host using',in2,' threads'
      on_target = .false.
   endif
!$omp end master
!$omp end parallel
!$omp end target

if(a(1) .ne. 3 .and. a(100) .ne. 3) then
   print *,' Failed'
else
   print *,' Passed'
endif

 

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.1.106 Build 20131008
Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

$ ifort -openmp TgtUpdIf.f90 -o TgtUpdIf.x
$ ./TgtUpdIf.x
  a = 1 executed on target using         224  threads
  Passed
  a = 3 executed on host using          32  threads
  Passed
$

Cheers,

Patrick

0 Kudos
TimP
Honored Contributor III
598 Views

I'm also trying to make a C++ version of it; it's slow going, in part because I don't find any working examples of omp target for C or C++.  I'm not getting much help from run-time error messages.

So far, I've been able to take the baby step of getting individual cases running with g++-4.9 (in which target==host).  Even with host execution, adding a target region slows it down significantly (with results checked).

Apparently, it may be possible to "map" a std:<vector> by defining a pointer to the address of the first element which can be used in the CEAN-like context, but then the original <vector> would appear to be hidden from the target region.

0 Kudos
pbkenned1
Employee
598 Views

Hi Tim,

I'll be interested in seeing the C++ versions when you have something.  Meantime, I've filed a problem report with the developers regarding the output being incorrect when the condition is not satisfied:

~~!$omp target update to( aa,n,max_,xindex,yindex) if(n>990)

I'll keep this thread updated with news.

Tracking defect # DPD200251078

Regards,

Patrick

0 Kudos
pbkenned1
Employee
598 Views

Defect #DPD200251078 is now planned to be fixed in Composer XE 2013 SP1 update #2, aka 14.0.2.  The compiler should be available within a few weeks.

A related defect mentioned by Tim in the opening description is also planned to be fixed in 14.0.2:

>>>Although target update has no apparent difficulty copying data to and from a common block marked with $!omp target (/cdata/), target map doesn't  transfer data correctly in a common block but it is fine otherwise.

Patrick

0 Kudos
TimP
Honored Contributor III
598 Views

I did make the comparative versions with the same offloads in C++ as in Fortran.  In the C++ case, if I increase the length specifier for the larger case (more than 64MB of offloaded data), it quits with the overlap message.  Even though I cheat by setting the same length specification as in the shorter case, H_TRACE=1 shows more data transferred in the larger case, and the results "appear" to be OK.  The question was raised that my older platform may not be supported in this mode.

The defect referred to here is that one case failed where it shouldn't be attempting to use the coprocessor, due to not satisfying the if clause on the omp target map.

0 Kudos
Ravi_N_Intel
Employee
598 Views

The issue with "if" clause resulting in false had been addressed in the compiler and will be available in the next compiler release.

0 Kudos
pbkenned1
Employee
598 Views

The issue with '!$omp target update to( aa,n,max_,xindex,yindex) if(n>990)' failing to work correctly when the 'if' condition is not satisfied (so that the computation falls back to the host) has been fixed in ifort-14.0.2

 

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.2.144 Build 20140120
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

$ ifort -openmp maind_s3110.F loopdoff2_s3110.F -o loopdoff2_s3110.x
$ ./loopdoff2_s3110.x

 Loop    VL     Seconds     Checksum      PreComputed  Residual(1.e-10)   No.
 s3110  100     0.258538    2.0200E+02    2.0200E+02                       80
 s3110 1000    10.349399    2.0020E+03    2.0020E+03                       80
 s3110 2000     0.128378    4.0020E+03    4.0020E+03                       80

 

Patrick

 

0 Kudos
Reply