If you have a type bound procedure and within the same module I do:
!$omp simd do i =1,N (...) call this% AN_ELEMENTAL_INLINE_ROUTINE (..) enddo
My optimisation report says that it was unable to inline indirect call list.
1 ) I would understand this complain if my type bound procedure could be overridden, however, i tried to explicitly set a `non_overridable` it did not help either.
2) If I got a procedure which is deferred /overridden, I obviously would not be able to ask the compiler to inline it. However, does anybody in here know whether or not such loops can still benefit from vectorisation. I initially thought to put the "!$OMP DECLARE SIMD(ROUTINE_NAME)" on every potential routine that would overwrite the type bound procedure .
type test contains procedure :: init => init_1 end type type extend(test) :: test_2 contains procedure init=> init_2 end type !$omp declare simd(init_2) subroutine init_2(....) end subroutine !$omp declare simd(init_1) subroutine init_1(....) end subroutine
Can you show what is attempting to being SIMD'd?
IOW what are the argument declarations to init_1, init_2, ... and how are they expected to be used ?
On a subroutine with pass (e.g. this), declaring elemental (and/or simd) would imply the "this" is elemental (inclusive of the additional arguments if any).
In the case where a member variable is an array, and where the particular is desired to be manipulated in a SIMD/elemental manner, I suggest you declare a private subroutine that does not take the this argument, but rather you pass in the reference to this%array as a traditional array reference.
I think my concern is not only targeted elemental routines. I am generally concerned with how vectorisation works for function/subroutine calls when these are dynamically polymorphic or type bound but not polymorphic, i.e. the compiler does or does not know the actual routine until runtime.
For the case with a static type bound procedure, I don't understand why the optimisation routine say it cannot inline that routine because of an indirect class?
For the second case, if my procedure is dynamically polymorphic, would it still vector If a have declared, all possible routines that could be called, with an !$omp simd declare(routine_name)
If these two cases are not clear, I am happy to make an example for what I mean.
In your type declaration, declare a nopass subroutine, iow one where the this pointer is .NOT. passed.
!$omp simd do I=1, N ... CALL AN_INLINE_ELEMENTAL_ROUTINE(this%memberArray(I), ...) ... end do
Vector instructions (SSE, AVX...) generally function with arrays of fundamental types (INTEGER, REAL, COMPLEX of 1, 2, 4 8 bytes), but not of arrays of user defined types. While your post #1 is not showing an array of user defined type, it has a subroutine dispatch (the call) based on the user defined type. The technique you need to do is to lift the type bound dispatch outside the loop.
You may need to expand on the above in the event that memberArray has a different type for each of the different UDTs. IOW this may require a SELECT/CASE and optionally use of ASSOCIATE array=>this%memberArray
Thanks for the reply.
Is there any specific reason why the routine cannot stay part of the type. I.e. doing the "call this% AN_INLINE_ELEMENTAL_ROUTINE"
Extending this further to when "CALL " has to be type bound procedure because that procedure polymorphic. I.e the procedure is only determined at run-time. How does this cope with vectorisation.
In computer programming, there are two conceptual entities called a vector. An abstract mathematical concept, generally a Fortran array of something, which can be a polymorphic type; .AND. a CPU entity of CPU intrinsic types (8-bit, 16-bit, 32-bit, 64-bit signed/unsigned integer or 32-bit, 64-bit floating point) of which the CPU intrinsic type is replicated into a CPU Small Vector of contiguous 8, 16, 32, 64 bit types that fill or partially fill the Small Vector who's width is 128, 256 or 512 bits wide. An example is 16 REAL(4)'s on a CPU with AVX512 support.
The code generated by the compiler (IOW a specific instruction such as "vector add packed single precision floating point") is not dependent on the data type. That is the CPU instruction itself does not look at the type, the compiler does and chooses which instruction to insert into the binary output. The compiler must generate a type dispatch. In C++-speak this would be a vtable dispatch. This would be a small section of code, which would be conditionally executed with tests and branches and such, making it all but impossible to use a collection of undisclosed types that are required to be adjacent for the single SIMD instruction to act upon.
While your loop in the application may reference only one of your types (across all iterations a loop instance), the compiler cannot generate code under this assumption. To resolve this (attain vectorization), you must code in an unambiguous manner. For example when the compiler can unambiguously know it is iterating across an array of REAL(4)'s that this loop can potentially be vectorized. Whether it can and cannot be vectorized will depend on the other statements in the loop.
While in C++ you can facilitate this using templates, Fortran unfortunately does not have templates. You will have to write individual SELECT/CASE/DO/ENDDO sections of code and/or specific instances of GENERIC procedures, which can be hacked together using the FPP and #define for your specific types.
If you can present a complete, and simple case that can be compiled by others, you might get a recommendation sooner.
Thanks for the reply. Are you essentially saying that the compiler must know what function/subroutine we are calling within a SIMD loop before it can vector, hence any polymorphic call is not possible. I so far thought you could do that as long as you declared those routines with a declare simd(Function_name)
If this is correct(that it is not possibe), can you then advice me on how to deal with dynamically changing conditions within a SIMD loop.
do i = 1,no_part (...) if( st% ip(i) == 1) then call ROUTINE1 else call ROUTINE2 endif enddo
I know you once showed me a nice example using the FPP and #define to make it attain vectorisation, which works very nicely for me when the ifs does not change over the course of the loop itself.
However, in the above example, you might have i = 1, calling ROUTINE1 while i=2 calling ROUTINE2.
I think I know the answer by now, but please confirm this with me. If I want to SIMD a loop, the compiler must execute the same code no matter what for each iteration i. Hence, there is no other way around that accepting the overhead of letting it execute all conditions before it automatically will make a merge for me depending on my condition.
A loop with a conditional section as above is not normally not capable of being vectorized even if ROUTINE1 and ROUTINE2 are inlined.
Only in very restrictive cases might the compiler figure out how to vectorize.
In order to vectorize the typically both ROUTINE1 and ROUTINE2 paths will be executed, and then the conditional test will be made to select what/where to store the result. For example should the inlining of the two routines produce something like
do i = 1,no_part if( st% ip(i) == 1) then st%foo(i) = st%foo(i)**2 else st%foo(i) = st%foo(i) / 2 endif enddo
Will produce something line this:
do i = 1,no_part tempA = st%foo(i)**2 tempB = st%foo(i) / 2 if( st% ip(i) == 1) then st%foo(i) = tempA else st%foo(i) = tempB endif enddo
*** where the IF(...) branches are replaced with conditional moves. IOW the loop can then run on each element of the array without branching.
The cost is increased by the unnecessary computation, but also reduced by the fact the loop can be vectorized.
You have provided insufficient detail anyone here to provide you with the information you seek.