Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

ASSOCIATE optimization improvement

jimdempseyatthecove
Honored Contributor III
1,083 Views

I have a user defined type, that contains a pointer to another user defined type, which contains an array.

In the older version of the compiler (pre-ASSOCIATE) I use a pointer to an array => firstObject%secondObject%array

In an effort to reduce access time. The older code, while faster, required copying the array descriptor into the pointer (and making an alias in the process).

With the newer compiler I experimented with using ASSOCIATE under an assumption that the compiler could optimize out the array descriptor copy by effectively constructing a reference to the outlaying array descriptor, thus saving the array descriptor copy time. This is not the case:

;;;     ASSOCIATE(rBMS => pFiniteSolution%rBMS, &

        vmovups   xmm0, XMMWORD PTR [3800+rbx]                  ;118.5
        vmovups   XMMWORD PTR [rbp], xmm0                       ;118.5
        vmovups   xmm1, XMMWORD PTR [3816+rbx]                  ;118.5
        vmovups   XMMWORD PTR [16+rbp], xmm1                    ;118.5
        vmovups   xmm2, XMMWORD PTR [3832+rbx]                  ;118.5
        vmovups   XMMWORD PTR [32+rbp], xmm2                    ;118.5
        vmovups   xmm3, XMMWORD PTR [3848+rbx]                  ;118.5
        vmovups   XMMWORD PTR [48+rbp], xmm3                    ;118.5
        mov       rax, QWORD PTR [3864+rbx]                     ;118.5
        mov       QWORD PTR [64+rbp], rax                       ;118.5

The first level of the indirection was performed earlier.

The above code potentially be replace by an

  lea...
  mov ..

or

  mov...
  lea...
  mov...

to effectively place the address (reference) of the array descriptor into a register or into a stack temporary.

In the above case, the array has a few 1000's of cells so the extra overhead is not all that large. When the arrays are smaller, then the overhead becomes larger proportionally.

In the process of setting up the experiment of testing ASSOCIATE I discovered an annoying side effect.

My code used to use the "." member variable separator as opposed to the official "%" separator:

Foo.bar.nada

as opposed to

Foo%bar%nada

As it is easier to syntactically evaluate the reference.

When a source file uses ASSOCIATE, the alternate "." separator is disabled, even for variables not involved with the ASSOCIATE.

In order to perform the test, I had to spend several hours changing the .'s to %'s being careful not to trash the IF statements.

Jim Dempsey

0 Kudos
3 Replies
Steven_L_Intel1
Employee
1,083 Views

I don't think the language semantics exactly match what you're trying to do, but a small test case would be useful. It has often been the case that new language constructs don't optimize well soon after they are introduced.

ASSOCIATE is not exactly like a pointer - it effectively creates a new variable that shares storage with the "selector".

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,083 Views
module foo
    type a
        real, allocatable :: array(:,:)
    end type a
    
    type p_a
        type(a), pointer :: p
    end type p_a
    
    type(p_a) :: example_p_a
end module foo
    
subroutine foofoo(x)
    use foo
    implicit none
    type(p_a) :: x
    integer :: i,j
    write(*,*) size(x%p%array)
    ASSOCIATE(local_array => x%p%array)
    write(*,*) size(local_array)
        
    do i=lbound(local_array,dim=1),ubound(local_array,dim=1)
        do j=lbound(local_array,dim=2),ubound(local_array,dim=2)
            local_array(i,j) = i*j
        end do
    end do
    END ASSOCIATE
end subroutine foofoo
    
program ASSOCIATEx
    use foo
    implicit none
    allocate(example_p_a%p)
    allocate(example_p_a%p%array(3,1000))
    call foofoo(example_p_a)
    write(*,*) sum(example_p_a%p%array)
end program ASSOCIATEx

producing

...
.B1.5::                         ; Preds .B1.4
        mov       rdi, QWORD PTR [r15]                          ;51.14
        mov       r10, r14                                      ;51.14
        mov       rbx, QWORD PTR [8+r15]                        ;51.14
        lea       rcx, QWORD PTR [48+rsp]                       ;51.14
        mov       rbp, QWORD PTR [16+r15]                       ;51.14
        mov       edx, -1                                       ;51.14
        mov       rax, QWORD PTR [24+r15]                       ;51.14
        mov       r8, 01208384ff00H                             ;51.14
        mov       r12, QWORD PTR [32+r15]                       ;51.14
        lea       r9, QWORD PTR [__STRLITPACK_1.0.2]            ;51.14
        mov       r13, QWORD PTR [40+r15]                       ;51.14
        lea       r11, QWORD PTR [152+rsp]                      ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2], rdi    ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+8], rbx  ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+16], rbp ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+24], rax ;51.14
        or        rax, 2                                        ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+32], r12 ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+40], r13 ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+48], r14 ;51.14
        mov       r13, QWORD PTR [56+r15]                       ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+56], r13 ;51.14
        imul      r10, rsi                                      ;51.14
        mov       rbx, QWORD PTR [64+r15]                       ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+64], rbx ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+72], rsi ;51.14
        mov       r12, QWORD PTR [80+r15]                       ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+80], r12 ;51.14
        mov       rbp, QWORD PTR [88+r15]                       ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+24], rax ;51.14
        mov       rax, rsp                                      ;51.14
        mov       QWORD PTR [FOOFOO$LOCAL_ARRAY$_3.0.2+88], rbp ;51.14
...

In the above case where the entire array descriptor is being associated (IOW not a slice of the array), it should be sufficient to use the array descriptor in the pointee in situ. The above is making a copy of the array descriptor. I am suggesting the compiler creates a reference (address of) the already existing array descriptor when the ASSOCIATE references the entire array descriptor.

When the example code is small, and the copy fits in registers, there is no issue:

.B1.5::                         ; Preds .B1.4
        mov       r10, r14                                      ;49.10
        lea       rcx, QWORD PTR [48+rsp]                       ;49.10
        imul      r10, rsi                                      ;49.10
        mov       edx, -1                                       ;49.10
        mov       r8, 01208384ff00H                             ;49.10
        mov       rax, rsp                                      ;49.10
        lea       r9, QWORD PTR [__STRLITPACK_1.0.2]            ;49.10
        lea       r11, QWORD PTR [152+rsp]                      ;49.10
        mov       rdi, QWORD PTR [r15]                          ;49.10
        mov       r13, QWORD PTR [56+r15]                       ;49.10
        mov       rbx, QWORD PTR [64+r15]                       ;49.10
        mov       r12, QWORD PTR [80+r15]                       ;49.10
        mov       rbp, QWORD PTR [88+r15]                       ;49.10

The above changed foofoo to be a recursive subroutine.

However, in the actual code where I made the observation, the register pressure was high, and this caused copies of what is made to registers above, into RAM. (equivalent to the non-recursive descriptor copy in the earlier .asm snip above).

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
1,083 Views

Ok, thanks. I'll run this by the developers.

0 Kudos
Reply