- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all!
I have problems using openmp and offload directives. The following (reduced) code give right result (1 2 3 4 5 0 0 0 0 0), when it's compiled without openmp ("ifort test.f -o test"), and wrong (1 2 3 4 5 6 7 8 9 10) with openmp ("ifort -openmp test.f -o test").
PROGRAM test integer c(10) c=0 !DIR$ OFFLOAD_TRANSFER target(mic:0) & nocopy(c: length(10) alloc_if(.true.) free_if(.false.)) !DIR$ OFFLOAD begin target(mic:0) nocopy(c) do i=1,10 c(i)=i enddo !DIR$ end OFFLOAD !DIR$ OFFLOAD_TRANSFER target(mic:0) & out(c(1:5): alloc_if(.false.) free_if(.false.) into(c(1:5))) !DIR$ OFFLOAD_TRANSFER target(mic:0) & nocopy(c: alloc_if(.false.) free_if(.true.)) WRITE(*,'(10(1X,I2))'), c END PROGRAM
"OFFLOAD_REPORT" when result is wrong: (not full)
[Offload] [MIC 0] [File] test.f
[Offload] [MIC 0] [Line] 9
[Offload] [MIC 0] [Tag] Tag 1
[Offload] [HOST] [Tag 1] [State] Start Offload
[Offload] [HOST] [Tag 1] [State] Initialize function __offload_entry_test_f_9MAIN__ifort1104196052Lk2WXB
[Offload] [HOST] [Tag 1] [State] Send pointer data
[Offload] [HOST] [Tag 1] [State] CPU->MIC pointer data 0
[Offload] [HOST] [Tag 1] [State] Gather copyin data
[Offload] [HOST] [Tag 1] [State] CPU->MIC copyin data 0
[Offload] [HOST] [Tag 1] [State] Compute task on MIC
[Offload] [HOST] [Tag 1] [State] Receive pointer data
[Offload] [HOST] [Tag 1] [State] MIC->CPU pointer data 0
[Offload] [MIC 0] [Tag 1] [State] Start target function __offload_entry_test_f_9MAIN__ifort1104196052Lk2WXB
[Offload] [HOST] [Tag 1] [State] Scatter copyout data
[Offload] [HOST] [Tag 1] [CPU Time] 0.001025(seconds)
[Offload] [MIC 0] [Tag 1] [CPU->MIC Data] 0 (bytes)
[Offload] [MIC 0] [Tag 1] [MIC Time] 0.000210(seconds)
[Offload] [MIC 0] [Tag 1] [MIC->CPU Data] 40 (bytes)
Offload region at line 9 do data transfer, when nocopy is set...
(composer xe 2013 sp1.2.144; ifort version 14.0.2)
(This is a reduced code, which reproduce the error. My full code contain openmp directives.)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you include the OpenMP directives in your simple example.
In particular, is your parallel region between lines 5 and 20 or within the 2nd offload as a parallel do?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I submitted the issue to Development (see internal tracking id below) to determine whether a fix is possible for the CXE 2013 SP1 (14.0 compiler) release.
I do not know whether any of these is usable in your original code, but for the test case, using either an allocatable array, or adding SAVE, or using other means to force array C into static storage with -openmp avoids the incorrect results.
If you are interested, the Beta program has been announced here: Invitation to join the Intel® Software Development Tools 2015 Beta program
(Internal tracking id: DPD200255733)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replies!
I would like to use multiple MIC device with OpenMP, so MIC regions is inside a OpenMP region. Originally these were in the same subroutine, but after separation, I could compile MIC code without -openmp compiler option. It's work, but I would like to compile all code with the same options, if it's possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zoltan,
RE Kevin's: I do not know whether any of these is usable in your original code, but for the test case, using either an allocatable array, or adding SAVE, or using other means to force array C into static storage with -openmp avoids the incorrect results.
You can also use AUTOMATIC, an Intel specific attribute, though -openmp should (will) make local arrays on stack, and may be redundant for the purpose of allocation, but may have the side effect of fixing the compiler bug. Note, for large single instance arrays you can also use ALLOCATABLE, SAVE (places the descriptor in save area).
Be aware that when multiple host OpenMP threads enter an offload to the same MIC, that this is somewhat equivalent to nested parallel regions. There is nothing wrong with doing this provided each host-to-MIC entry is programmed to use a subset of the available threads on the MIC. Failure to do so may yield unacceptable results.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zoltan - It sounds like you have been able to program around this issue. Our Developers confirmed there is a fix in the release scheduled for later this year and have asked whether a fix is required in a future update for the current Composer XE 2013 SP1 release.
Please let me know if your work around is sustainable until our release later this year.
Thank you

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page