Re: Intra vectorization and Intel Linux FORTRAN 90 compiler

scottrwth · ‎09-13-2005

Dear Sirs,

My apoligies in advance if this question has already been posed.
If I am not mistaken, apparently the INTEL processor has intra-vectorization possibilities. I am not sure how this works
because I am also told that unless we are dealing with a CRAY,
processors with vector registers are no longer made (?)

At any rate, this vectorization can be triggered when using the
PORTLAND compiler using the -fastsse option. In general, this
is the sse option and variations thereof...

Question is: what is that intra-vectorization for the Linux
FORTRAN 90 compiler?

The intra-vectorization possibilities is a hardware feature of
the chip itself and so I would think this should be accessible
regardless of the language or operating system (in principle).

Any feedback would be greatly appreciated

Tony Scott
RWTH-Aachen

Steven_L_Intel1 · ‎09-13-2005

Welcome to the forum, Tony.

Yes, the Intel compilers can perform vectorization when you specify that you are compiling for one or more of the Intel processor types which include vector instructions. There are three generations of vector instructions in our IA-32 processors. SSE (Streaming SIMD Extensions) was introduced with Pentium III and primarily dealt with integers. SSE2 was introduced with Pentium 4 and adds floating types, while SSE3, the newest, was introduced with more recent Pentium 4 and Xeon processors. You use the -x switch to specify processor generation, for example, -xP enables generation of SSE3 instructions. See the Intel Fortran Compiler Compiler Options reference for details on the switches.

Intel Itanium processors also offer vectorization and our compilers for Itanium use this feature.

For more information, refer to the optimization sections of the Intel Fortran Optimizing Applications manual, as well as extensive general discussion of the SSE instructions on the Intel web site.

scottrwth · ‎09-13-2005

Dear Sir,

Many thanks for the info. I have checked the -xp with the
-vec_report1 or -vec_report3 to see what was vectorized and
what wasn't. It seems that all my loops for e.g. setting
a matrix or array to zero were vectorized.

There is an important loop I am trying to vectorized but
it involves a call to a subroutine, i.e.:

Do 1000 i=1,N
call UMD2SO(....)
1000 continue

Even though each subroutine call is independent, the loop
is NOT vectorized.

Any advice?

Is it a matter of putting the vectorization block inside
the subroutine UMD?

best wishes

Tony Scott

Steven_L_Intel1 · ‎09-13-2005

A subroutine call will prevent vectorization. You may want to split this into two loops, one which does the computations and one which does the subroutine calls.

TimP · ‎09-13-2005

If your subroutine has no loop in it, and you are looking to enable vectorization by in-line expansion in the calling program, you would require the -ipo option (for separate files) or -ip (inline within same file). Pushing the DO loop inside the subroutine may be more satisfactory.

Intel_C_Intel · ‎09-13-2005

Hi Tony,

As an additional comment, since you use the term intra vectorization, I guess you meant intra-register vectorization, a term I used a few years ago to distinguish vectorization for multimedia extensions from vectorization for traditional vector processors, like the Cray (the term SIMDizationseems to havebecome more popular, however). An online tutorial forvectorization can be found at IDS at:
http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm
and, after reading this, you want to know more on the background of vectorization for multimedia extensions, you may want to consider reading The Software Vectorization Handbook at:
http://www.intel.com/intelpress/sum_vmmx.htm
or some other publications at:
http://www.aartbik.com/pub.html

Aart Bik
http://www.aartbik.com/

Message Edited by abik on 09-16-2005 12:16 PM

scottrwth · ‎09-14-2005

Dear Sirs,

Again my apologies. I found out through a test example
taht if I use -ip or -ipo, then the loop does vectorize
after all even if it has a subroutine or function call.

best wishes

Tony

Steven_L_Intel1 · ‎09-14-2005

Aart contacted me offline to let me know that my descriptions of the various Intel processor vectorization features was somewhat "off".

SSE was single-precision, SSE2 was double precision and SSE3 extended features from SSE2.

Itanium processors don't have vectorization, per se, but the architecture there is very different and the compiler can use various features such as rotating registers to do a lot of computations in fewer cycles.