Hello, I have a orginal code which I have reworked to be able to increase omp parallization possibilities.
In original program I have sth as follows:
module Y complex*16 array(1000) end module subroutine process_array() use Y ! do some stuff with array end
and the program is compiled with QSave switch
In the new program, I use default storage and I have reworked code in the following manner:
module process_array type (process_array_data) complex*16,pointer :: array(:) end type contains subroutine process_array(this, .... ) ! do some stuff with array end end subroutine end module
and the array is allocated in main program before process_array() execution
I noticed that for the first case I get execution time shorter for about 0.1s than in the second case (processor Intel i5)
- The first case execution time is about 0.06s
- The second case execution time is about 0.16s
Is there any reason for this - I only measure subroutine execution time skipping the part of code required for array allocation?
Without seeing a complete, reproducible and minimal test case, it's impossible to say. My experience with questions such as yours is that the test program often isn't timing what the programmer thinks it is. This is not to say that the difference isn't real, but we can't know that from pseudocode. My experience is also that pseudocode rarely is an accurate representation of the real code.
Please spend a few minutes and work up a complete test case, and be very sure that the results of whatever you are timing actually get used, or else the compiler is likely to "evaporate" the code under test.
In addition to what Steve comments, try to get a program with a meaninful computing time, not just a single execution and try to report how you are measuring that comuting time too.
As Dops indicates, structure your test code to run for a few seconds to warm things up.
complex*16,pointer :: array(:) vs complex*16,allocatable :: array(:)
While both are allocatable, the pointer format can have stride and/or aliases, whereas the allocatable is assured to be contiguous and not aliased. These differences affect optimization opportunities.