interdependent modules and IPO

Frederic_C_ · ‎07-12-2016

Hi all,

here a simple example of interdependent derived types:

MODULE m_common

   USE, intrinsic :: iso_fortran_env

   implicit none

   integer, parameter :: ip = INT32
   integer, parameter :: rp = REAL64

   integer(ip), parameter :: n = 80400 , maxit = 1000

   integer(8) :: count , count_scale , count_max

   integer(ip) :: i , j , k

   real(rp) :: time , rate

   TYPE vec2d

      real(rp) :: x = 0._rp
      real(rp) :: y = 0._rp

   CONTAINS

      procedure, pass(self) :: equal_vec2d

      generic :: assignment(=) => equal_vec2d

   END TYPE vec2d

CONTAINS

   PURE ELEMENTAL SUBROUTINE equal_vec2d( self , from )

      class(vec2d), intent(inout) :: self
      class(vec2d), intent(in   ) :: from

      self%x = from%x
      self%y = from%y

   END SUBROUTINE equal_vec2d

END MODULE m_common

MODULE m_data

   USE m_common

   implicit none

   TYPE subdata

      type(vec2d) :: vector

   END TYPE subdata

   TYPE somedata

      type(subdata), allocatable :: sub_array(:)

   END TYPE somedata

END MODULE m_data

PROGRAM test

   USE m_common
   USE m_data

   type(somedata) :: totest

   type(vec2d) :: vector

   call system_clock( count , count_scale , count_max )

   rate = real( count_scale , 8 )

   allocate( totest%sub_array( n ) )

   time = real( count , 8 )

   do i = 1,maxit

      do k = 1,n

         vector = totest%sub_array( k )%vector

      end do

   end do

   call system_clock( count , count_scale , count_max )

   write(6,'(ES10.2)') ( real( count , 8 ) - time ) / rate

END PROGRAM test

If I compile this program with ifort and gfortran, I obtain these time computation results:

ifort -O3 test_all.f90 -o exe ; ./exe
6.95E-04

gfortran -O3 test_all.f90 -o exe ; ./exe
6.76E-04

At this time, no problem. Now, if I split in 3 programs test1.f90 test2.f90 and test3.f90, I obtain these results:

ifort -O3 test1.f90 test2.f90 test3.f90 -o exe ; ./exe
1.04E+00

ifort -O3 -ipo test1.f90 test2.f90 test3.f90 -o exe ; ./exe
7.45E-04

gfortran -O3 test1.f90 test2.f90 test3.f90 -o exe ; ./exe
3.17E-01

gfortran -O3 -flto test1.f90 test2.f90 test3.f90 -o exe ; ./exe
1.01E-01

So, time computation differences are very very big, and I do not understand what can really explain such differences, just accessing in memory to the main derived type array values with a very short stride

If interprocedural optimization with ifort gives me back the same performance than with only one file program, it is not the case with gfortran even if an effect is found

There is a way to compile separately my modules in order to use them efficiently in a very large program calling them a lot without ipo flag ??? maybe compiling a static or shared library with my modules and then link it to my main program ???

Fred

Kevin_D_Intel · ‎07-12-2016

Cross posted. Refer to https://software.intel.com/en-us/node/661253 for further comments.