a newbie of vectorization

Intel_C_Intel · ‎09-05-2008

Hi
I am a newbie of vectorization.
I have surfed some idea about that, but doesn't know very clear.

1. Does vectorization run faster?
2. Does multi-threading also run faster?

from http://en.wikipedia.org/wiki/Automatic_vectorization
An example would be a program to multiply two vectors of numeric data. A scalar approach would be something like:
for (i = 0; i {
C = A*B;
}
This could be transformed to vectorized code something like:
for (i = 0; i {
C = A*B;
}

Does "C = A*B;" mean it is a four-threads which can be run simultaneously by different cpus? (My pc contains an intel Quad core Q6600.)

I did a little test:

program Console2
!DEC$ VECTOR ALWAYS
....

but the compiler shows "Info: This directive is misplaced or not supported on this platform.
"
ps. I use Windows XP (32bit).

Mikhail

TimP · ‎09-06-2008

The purpose of vectorization, on an SSE platform, is to make use of the parallel instructions so as to perform 2 or 4 operations in parallel, typically with a proportional increase in performance.
Fortran array assignment is suggestive of vectorization, but there is no one to one correspondence. The same operation would vectorize equally well when written in f77 or C. On an SSE platform, stride 1 operations, such as you quote, are best suited for vectorization.
Few 32-bit compilers default to vector-compatible options. For example, ifort up to now required an SSE option such as /QxW to enable auto-vectorization. Most currently maintained compilers do support auto-vectorization, Microsoft C++ being a major exception.
The vector directives, such as you show, take effect when placed immediately before a qualifying loop. VECTOR ALWAYS doesn't necessarily enable vectorization; it specifies vectorization whenever possible, ignoring certain hazards, regardless of the compiler's value assessment.
Ifort supports threaded parallelism for multiple cores by OpenMP or /Qparallel. For Windows, those are implemented by Windows threads. The co-array notation of Fortran 2008 is suggestive of multi-processing, but is not yet implemented in ifort. Data to be computed effectively by threaded parallelism must lie several cache lines apart, so these methods for threading are best suited to outer loops, in accordance with the old slogan "concurrent outer vector inner."