Thanks.

zer0nes · ‎03-29-2013

Hi, can I utilize MKL (or any other library) for the following operations?

[cpp]

for (int i=0; i<N; i++)

y = x[map]

[/cpp]

or

[cpp]

for (int i=0; i<N; i++) x[map] += y

[/cpp]

The second operation looks impossible to parallelize using SSE or OpenMP, does it?

Thanks.

Zhang_Z_Intel · ‎03-29-2013

There are two BLAS level-1 functions that are exactly for these purposes: GTHR and ROTI. Please look here and here. Both are vectorized and parallelized in MKL.

zer0nes · ‎03-29-2013

Thanks!

Zhang Z (Intel) wrote:

There are two BLAS level-1 functions that are exactly for these purposes: GTHR and ROTI. Please look here and here. Both are vectorized and parallelized in MKL.

Unfortunately, roti doesn't work in my case because it requires that the values in indx must be distinct. In my case, the first operation is an expansion while the second one is a contraction. My indx has many duplicated values.

zer0nes · ‎03-29-2013

Thanks.

Unfortunately, ROTI doesn't apply in my case because it requires that indx has unique values. My indx has many duplicated values. Basically, the first operation is an expansion so that BLAS's gemm can be called. The second operation is the reduction.

It the second operation is not vectorizable, will I be able to utilize MKL if I change the second operation to the following?

[csharp]

for (int i=0; i<M; i++) {

int[] indices = map; // map is int[][]

for (int j=0; j<indices.Length; i++)

x += y[indices];

}

[/csharp]

TimP · ‎03-30-2013

As long as you have repeated values in map[], vectorization or parallelization introduces indeterminacy on which of the repeated values takes final effect. If you don't care which of those takes effect, promoting parallelization by the MKL function or by assertions such as #pragma ivdep in your code could be acceptable. The resulting race conditions could restrict the performance gain if there are enough of them.

zer0nes · ‎03-30-2013

Can I parallelize with MKL for the modified update algorithm?

[csharp]

for (int i=0; i<M; i++) {

int[] indices = map; // map is int[][]

for (int j=0; j<indices.Length; i++)

x += y[indices];

}

[/csharp]

TimP (Intel) wrote:

As long as you have repeated values in map[], vectorization or parallelization introduces indeterminacy on which of the repeated values takes final effect. If you don't care which of those takes effect, promoting parallelization by the MKL function or by assertions such as #pragma ivdep in your code could be acceptable. The resulting race conditions could restrict the performance gain if there are enough of them.

Zhang_Z_Intel · ‎04-01-2013

There isn't an MKL function for this (when there are duplicate values in the index vector). But you can use the Intel compiler to vectorize/parallelize your own implementation. For example, you can parallelize the outer loop with OpenMP parallel for, and vectorize the inner loop with #pragma ivdep and other vectorization pragmas. You can check whether your code is successfully vectorized or not by using the "-vector-report" option of Intel compiler. Vectorization is a big topic by itself. There are many things you can do to make your code vectorize better. This page (http://software.intel.com/en-us/intel-vectorization-tools) is the ultimate guide for all you need to know about vectorization with Intel compilers. If you are in a hurry, you can start with this article: http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/

Copy from one vector to another via an index map using MKL?