Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28592 Discussions

Vectorization of derived type assignments

zp3
Beginner
610 Views

Hi,

I'm using following derived type for an extended number representation:

    type :: XNT_TYPE_XNUM
        !(S)ignificant
        real(XNT_SK) :: s
        !(E)xponent
        integer(XNT_EK) :: e
    end type XNT_TYPE_XNUM

Somewhere I've declared the two variables:

    type(XNT_TYPE_XNUM), target :: ps(2,nl,nm)
    ...
    type(XNT_TYPE_XNUM) :: pt(2,nl,3)

When I do the assignment:

    pt(:,:,2)=ps(:,:,j)

the vectorization report says:

vector dependence: assumed ANTI dependence between PT line xxxx and  line xxxx

When writing instead:

    pt(:,:,2)%s=ps(:,:,j)%s
    pt(:,:,2)%e=ps(:,:,j)%e

then vectorization is possible but inefficient.

Can anybody tell me the reason why the compiler finds a dependence???

Thanks in advance!!

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
610 Views

The two array slices should be non-overlapping contiguous sections (regardless of potential padding in XNT_TYPE_XNUM). The assignment statement (without %) should be able to use __intel_fast_memcpy (or whatever is used now). While that function is fully vectorized it will likely not report as being vectorized (inline). Check the assembly code (via VTune if possible). -S output might not yield the same code when IPO is enabled.

The secondary format (with %) should be avoided because this would require gather/scatter (assuming gather/scatter could operate when XNT_SK .NE. XNT_EK

Jim Dempsey

0 Kudos
zp3
Beginner
610 Views

Ok, thanks a lot for this explanation! This would solve the mystery...

It's interesting that the code:

forall (k=1:2,i=1:nl)
    pt(k,i,2)%s=ps(k,i,j)%s
    pt(k,i,2)%e=ps(k,i,j)%e
end forall

gives me the same vreport result as:

pt(:,:,2)%s=ps(:,:,j)%s
pt(:,:,2)%e=ps(:,:,j)%e

I always thought that sequential statements in forall constructs are executed sequentially, so that no slicing should occure...

So generally, would it be more efficient to build a structure of arrays when executing a lot of such statements (also more complicated than assignments)?

Thanks!

0 Kudos
jimdempseyatthecove
Honored Contributor III
610 Views

Before you go SOA try adding SEQUENCE to your type:

type :: XNT_TYPE_XNUM
    SEQUENCE
    !(S)ignificant
    real(XNT_SK) :: s
    !(E)xponent
    integer(XNT_EK) :: e
end type XNT_TYPE_XNUM

Also, just what are you intending to do with this type? It looks like you are trying to increase the size (range) of the exponent as opposed to the significant. IVF supports REAL(16) with software functions performing the calculations.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
610 Views

Oh, do not rely on the vectorization report for the array copy of these types. Run a performance test. As stated earlier, should the compiler convert the copy into a call to one of the memmov functions, the compiler will not report that the statement was vectorized (even though SIMD vectors are used in the selected memmov function).

Jim Dempsey

0 Kudos
zp3
Beginner
610 Views

jimdempseyatthecove wrote:

Also, just what are you intending to do with this type? It looks like you are trying to increase the size (range) of the exponent as opposed to the significant. IVF supports REAL(16) with software functions performing the calculations.

Yes, I intend to increase the exponent dramatically, as I have a very dynamic range in the recursion of certain polynomial function families. Quadruple precision does not fullfill my needs as it just increases the exponent size by 4 bits, meaning just a 16 times wider range than double precision. Using my representation with a 8-byte integer as exponent extension, the 2e19 times wider range is sufficient for quite a long time...

0 Kudos
jimdempseyatthecove
Honored Contributor III
610 Views

On one of the other IDZ forum pages there was a link to the Fortran Wiki. The web page contains links to a few extended precision libraries. You might find one of them suitable for your purposes.

Jim Dempsey

0 Kudos
Reply