- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
System: Centos 7.0, compiler: parallel_studio_xe_2015, MPSS 3.4.2.
I have Fortran code, when I compile in native mode. The running time is about 2.4 (seconds). When I compile in offload the running time is about 5.2 (seconds). The transfer data size back to CPU in upload mode is about 144MB, which I estimated is about 0.024 (seconds) for 6GB/s PCI Express. The data for offload from CPU to Xeon Phi is about 3.6MB. My question is why the offload code is much slower than the native mode code? To compile the offload code I run
source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64
then using ifort command.
I have other questions regarding offload in Fortran:
1.1) Using data in module:
MODULE shared_data
REAL global_x(1000)
END MODULE shared_data
subroutine fun1(y)
use shared_data
!access global_x here
! however global_x = 0 here?
end subroutine
subroutine fun2(y)
calll fun1(y)
end subroutine
program MAIN
use shared_data
real y(10)
global_x = 1.0
call fun2(y)
end
I try to use this module in subroutines which are offloaded to Xeon Phi. I have tried different Intel-specific directives or OpenMP 4.0 directives. However I could not transfer the value from CPU to Xeon Phi. Could anyone give a simple example how this can be done? Note that the code works fine in native mode.
1.2) Transfer subarray of 2-D/N-D array to Xeon Phi in offload:
I have array
real x(1000,5)
I would like to offload to each Xeon Phi card x(:,i) , i=1:4 to 4 Xeon Phi cards to distribute work but when I put that in offload data directives, compiler reports errors of non-continous array. How can this be done?
Many thanks in advance for your help.
M
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page