Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

SSE Intrinsics.

ramon_2
Beginner
1,035 Views
I would like to see SSE intrinsics supported by the Intel Fortran compiler. An alternative would be inline assembly.

There are some cases where the compiler is not able to vectorize code. For instance:

real*4 a(4), c

c = SQRT(SUM(a**2)) ! The module of a vector

This one is quite suprising. It is a very frequent operation in molecular simulation codes, to evaluate the distance between two points.

I understand that the development resources committed for compiler development are limited, and the compiler cannot support every posible case of vectorization. However, please allow the programmer to do it by hand. Most C++ compilers support SSE intrinsics. The Fortran compiler should support them as well.
0 Kudos
5 Replies
g_f_thomas
Beginner
1,035 Views
Hear, hear. Keep in mind though that Intel's primary line of business is chips and software is quite secondary at best. It seems to me that Fortran is at the back of the bus when it comes to Intel software. Intel Fortran doesn't even come with a 'Hello World' sample to validate the installation but it does come with a fancy Tutorial on how to use, wait for it!, the Intel C++ compiler. Figure.
Anyways, you might find
of interest but I sense that you're already aware of it.
Good luck,
Gerry T.
0 Kudos
ramon_2
Beginner
1,035 Views
A cheap option would be to allow inline assembly. This option is supported by the Salford Fortran compiler.
0 Kudos
g_f_thomas
Beginner
1,035 Views
If Salford fulfils your needs (which I very much doubt, FWIW) why are you snivelling about IVF? FYI, it's possible to do mixed IVF ASM programming, so just do it.

Ciao,
Gerry T.
0 Kudos
jean-vezina
Beginner
1,035 Views
I have compiled the following test program with
the compiler options /O3 /QxK (or /QxN for a Pentium IV
machine and the code is indeed vectorized.
Sample code:
real c,a(4)
a=3.
c = SQRT(SUM(a**2)) !
print *,c
end
Command used:
Pentium III: ifort /O3 /QxK /FAs vec.f90
Pentium IV: ifort /O3 /QxN /FAs vec.f90
the /FAs option is used to produce an assembly language
listing of the code generated by the compiler.
In both cases SSE or SSE2 instructions are generated,
showing that the code is vectorized.
Best regards,
Jean Vezina
0 Kudos
TimP
Honored Contributor III
1,035 Views

I tried Jean's example, with ifort 8.0.050. While it is using a parallel instruction to assign values to a(:), it's using serial SSE instructions to perform the calculations. The option /QxP would be required, to allow the possibility of using a parallel instruction to perform the final addition of 4 operands. Have you been able to make a benchmark showing an advantage for performing a parallel multiplication, then a serial addition? The compiler's automatic vectorization of sum reduction is done with 8 partial sums, evidently not applicable to such a short vector.

Message Edited by tcprince on 06-23-2004 10:23 AM

0 Kudos
Reply