- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My apoligies in advance if this question has already been posed.
If I am not mistaken, apparently the INTEL processor has intra-vectorization possibilities. I am not sure how this works
because I am also told that unless we are dealing with a CRAY,
processors with vector registers are no longer made (?)
At any rate, this vectorization can be triggered when using the
PORTLAND compiler using the -fastsse option. In general, this
is the sse option and variations thereof...
Question is: what is that intra-vectorization for the Linux
FORTRAN 90 compiler?
The intra-vectorization possibilities is a hardware feature of
the chip itself and so I would think this should be accessible
regardless of the language or operating system (in principle).
Any feedback would be greatly appreciated
Tony Scott
RWTH-Aachen
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the Intel compilers can perform vectorization when you specify that you are compiling for one or more of the Intel processor types which include vector instructions. There are three generations of vector instructions in our IA-32 processors. SSE (Streaming SIMD Extensions) was introduced with Pentium III and primarily dealt with integers. SSE2 was introduced with Pentium 4 and adds floating types, while SSE3, the newest, was introduced with more recent Pentium 4 and Xeon processors. You use the -x switch to specify processor generation, for example, -xP enables generation of SSE3 instructions. See the Intel Fortran Compiler Compiler Options reference for details on the switches.
Intel Itanium processors also offer vectorization and our compilers for Itanium use this feature.
For more information, refer to the optimization sections of the Intel Fortran Optimizing Applications manual, as well as extensive general discussion of the SSE instructions on the Intel web site.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many thanks for the info. I have checked the -xp with the
-vec_report1 or -vec_report3 to see what was vectorized and
what wasn't. It seems that all my loops for e.g. setting
a matrix or array to zero were vectorized.
There is an important loop I am trying to vectorized but
it involves a call to a subroutine, i.e.:
Do 1000 i=1,N
call UMD2SO(....)
1000 continue
Even though each subroutine call is independent, the loop
is NOT vectorized.
Any advice?
Is it a matter of putting the vectorization block inside
the subroutine UMD?
best wishes
Tony Scott
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tony,
As an additional comment, since you use the term intra vectorization, I guess you meant intra-register vectorization, a term I used a few years ago to distinguish vectorization for multimedia extensions from vectorization for traditional vector processors, like the Cray (the term SIMDizationseems to havebecome more popular, however). An online tutorial forvectorization can be found at IDS at:
http://www.intel.com/cd/ids/developer/asmo-na/eng/65774.htm
and, after reading this, you want to know more on the background of vectorization for multimedia extensions, you may want to consider reading The Software Vectorization Handbook at:
http://www.intel.com/intelpress/sum_vmmx.htm
or some other publications at:
http://www.aartbik.com/pub.html
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 09-16-2005 12:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Again my apologies. I found out through a test example
taht if I use -ip or -ipo, then the loop does vectorize
after all even if it has a subroutine or function call.
best wishes
Tony
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
SSE was single-precision, SSE2 was double precision and SSE3 extended features from SSE2.
Itanium processors don't have vectorization, per se, but the architecture there is very different and the compiler can use various features such as rotating registers to do a lot of computations in fewer cycles.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page