Community
cancel
Showing results for 
Search instead for 
Did you mean: 
latinhoa
Beginner
163 Views

openmp+simd or openmp+sse help!!!

I want to know how can I use the openmp+simd or openmp+sse.
And is it like openmp+MPI?
Thank you!
0 Kudos
3 Replies
Michael_K_Intel2
Employee
163 Views

Quoting - latinhoa
I want to know how can I use the openmp+simd or openmp+sse.
And is it like openmp+MPI?
Thank you!

Hi!

OpenMP is a programming model for parallelizing programs by adding hints (so-called pragmas) to the program. These hints are used by the compiler to determine how to parallelize your code. SSE (an incarnation of SIMD) is a model for vectorization, that is, applying a single instruction to multiple data items (that's why it's called SIMD model). So, from a programmer's point of view it is perfectly reasonable to parallelize a programm on a high level (e.g. an outer loop) and to vectorize a program on a fine-grained level (e.g. on the inner-most loops).

MPI is something really different. While OpenMP targets shared-memory programming, MPI is a programming model for distributed machines (e.g. clusters). MPI requires the programmer to handle messages that are sent between the different nodes of the cluster. But, again, MPI and OpenMP blend well with each other if you respect certain rules of what is called "hybrid OpenMP/MPI programming".

To come back to your question: You would exploit OpenMP and SSE by adding OpenMP pragmas to your code to instruct the compiler how to parallelize the code base. (Don't forget to turn on OpenMP support though. GCC: -fopenmp, ICC -openmp) If you turn on compiler optimzation (e.g. -O3 with GCC/ICC), the compiler is likely to figure out how to generate SSE code on the fine-grain levels. If that is not sufficient, then you can go for intrinsics and write SSE code by yourself.

Cheers,
-michael
latinhoa
Beginner
163 Views


Hi!

OpenMP is a programming model for parallelizing programs by adding hints (so-called pragmas) to the program. These hints are used by the compiler to determine how to parallelize your code. SSE (an incarnation of SIMD) is a model for vectorization, that is, applying a single instruction to multiple data items (that's why it's called SIMD model). So, from a programmer's point of view it is perfectly reasonable to parallelize a programm on a high level (e.g. an outer loop) and to vectorize a program on a fine-grained level (e.g. on the inner-most loops).

MPI is something really different. While OpenMP targets shared-memory programming, MPI is a programming model for distributed machines (e.g. clusters). MPI requires the programmer to handle messages that are sent between the different nodes of the cluster. But, again, MPI and OpenMP blend well with each other if you respect certain rules of what is called "hybrid OpenMP/MPI programming".

To come back to your question: You would exploit OpenMP and SSE by adding OpenMP pragmas to your code to instruct the compiler how to parallelize the code base. (Don't forget to turn on OpenMP support though. GCC: -fopenmp, ICC -openmp) If you turn on compiler optimzation (e.g. -O3 with GCC/ICC), the compiler is likely to figure out how to generate SSE code on the fine-grain levels. If that is not sufficient, then you can go for intrinsics and write SSE code by yourself.

Cheers,
-michael
Thank you!
I think there are some rules to use Openmp+simd, because sometimes The code can't be parallelized+vectorized.
I know Intel C++ Compiler has auto-openmp and auto-vectorization, but I don't know the rules that can tell compiler what should do and what can't do.
so I want to know some rules.

Michael_K_Intel2
Employee
163 Views

Quoting - latinhoa

Hi!

OpenMP is a programming model for parallelizing programs by adding hints (so-called pragmas) to the program. These hints are used by the compiler to determine how to parallelize your code. SSE (an incarnation of SIMD) is a model for vectorization, that is, applying a single instruction to multiple data items (that's why it's called SIMD model). So, from a programmer's point of view it is perfectly reasonable to parallelize a programm on a high level (e.g. an outer loop) and to vectorize a program on a fine-grained level (e.g. on the inner-most loops).

MPI is something really different. While OpenMP targets shared-memory programming, MPI is a programming model for distributed machines (e.g. clusters). MPI requires the programmer to handle messages that are sent between the different nodes of the cluster. But, again, MPI and OpenMP blend well with each other if you respect certain rules of what is called "hybrid OpenMP/MPI programming".

To come back to your question: You would exploit OpenMP and SSE by adding OpenMP pragmas to your code to instruct the compiler how to parallelize the code base. (Don't forget to turn on OpenMP support though. GCC: -fopenmp, ICC -openmp) If you turn on compiler optimzation (e.g. -O3 with GCC/ICC), the compiler is likely to figure out how to generate SSE code on the fine-grain levels. If that is not sufficient, then you can go for intrinsics and write SSE code by yourself.

Cheers,
-michael
Thank you!
I think there are some rules to use Openmp+simd, because sometimes The code can't be parallelized+vectorized.
I know Intel C++ Compiler has auto-openmp and auto-vectorization, but I don't know the rules that can tell compiler what should do and what can't do.
so I want to know some rules.


Hi!

If you turn on auto-vectorization, the compiler tries to figure out whether or not a loop is amenable for vectorization. If the compiler is 100% sure (by static analysis), it will emit vectorized code. If not 100% sure, it will react conservatively and only emit regular non-SSE code. The same applies to auto-parallelization.

Asking for rules, the there's only a single simple rule (for loops): If the iterations of a loop can be executed independently, then you can parallelize/vectorize the loop. For example:

for (int i = 0; i < 32; i++) {
for (int j = 0; j < 32; j++) {
a[i,j] = a[i,j] * b[j,i];
}
}

Is parallelizable, as each iteration of both loop is independent of each other. Some also say, that both loops do not have a loop-carried dependence. The next example cannot be parallelized:

for (int i = 0; i < 32; i++) {
for (int j = 0; j < 32; j++) {
a[i,j] = a[i+1,j+1] * b[j,i];
}
}

In this example, the next iteration of a loop can only start, if the first iteration has been completed. Hence, you cannot execute these loops in parallel.

If you want to learn more on how to write loops that can be parallelized and how to analyze existing loops, I would recommend a book on parallel programming. MOst of them deal with this issue. If you want to dive into details how compilers analyse your code, you can read David F. Bacon, Susan L. Graham, and Oliver J. Sharp, Compiler Transformations for High-performance Computing, ACM Computing Surveys, 26(4), 1994). But be aware that Bacon et al's text is a tough text.

Cheers
-michael
Reply