Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Copy from one vector to another via an index map using MKL?

zer0nes
Beginner
489 Views

Hi, can I utilize MKL (or any other library) for the following operations?

[cpp]

for (int i=0; i<N; i++)

    y = x[map]

[/cpp]

or 

[cpp]

for (int i=0; i<N; i++) x[map] += y

[/cpp]

The second operation looks impossible to parallelize using SSE or OpenMP, does it?

Thanks.

0 Kudos
6 Replies
Zhang_Z_Intel
Employee
489 Views

There are two BLAS level-1 functions that are exactly for these purposes: GTHR and ROTI. Please look here and here. Both are vectorized and parallelized in MKL.

0 Kudos
zer0nes
Beginner
489 Views

Thanks!

Zhang Z (Intel) wrote:

There are two BLAS level-1 functions that are exactly for these purposes: GTHR and ROTI. Please look here and here. Both are vectorized and parallelized in MKL.

Unfortunately, roti doesn't work in my case because it requires that the values in indx must be distinct. In my case, the first operation is an expansion while the second one is a contraction. My indx has many duplicated values.

0 Kudos
zer0nes
Beginner
489 Views

Thanks. 

Unfortunately, ROTI doesn't apply in my case because it requires that indx has unique values. My indx has many duplicated values. Basically, the first operation is an expansion so that BLAS's gemm can be called. The second operation is the reduction.

It the second operation is not vectorizable, will I be able to utilize MKL if I change the second operation to the following?

[csharp]

for (int i=0; i<M; i++) {

    int[] indices = map; // map is int[][]

    for (int j=0; j<indices.Length; i++)

        x += y[indices];

}

[/csharp]

0 Kudos
TimP
Honored Contributor III
489 Views

As long as you have repeated values in map[], vectorization or parallelization introduces indeterminacy on which of the repeated values takes final effect.  If you don't care which of those takes effect, promoting parallelization by the MKL function or by assertions such as #pragma ivdep in your code could be acceptable.  The resulting race conditions could restrict the performance gain if there are enough of them.

0 Kudos
zer0nes
Beginner
489 Views

Can I parallelize with MKL for the modified update algorithm?

[csharp]

for (int i=0; i<M; i++) {    

    int[] indices = map; // map is int[][]    

    for (int j=0; j<indices.Length; i++)        

        x += y[indices];

}

[/csharp]

TimP (Intel) wrote:

As long as you have repeated values in map[], vectorization or parallelization introduces indeterminacy on which of the repeated values takes final effect.  If you don't care which of those takes effect, promoting parallelization by the MKL function or by assertions such as #pragma ivdep in your code could be acceptable.  The resulting race conditions could restrict the performance gain if there are enough of them.

0 Kudos
Zhang_Z_Intel
Employee
489 Views

There isn't an MKL function for this (when there are duplicate values in the index vector). But you can use the Intel compiler to vectorize/parallelize your own implementation. For example, you can parallelize the outer loop with OpenMP parallel for, and vectorize the inner loop with #pragma ivdep and other vectorization pragmas. You can check whether your code is successfully vectorized or not by using the "-vector-report" option of Intel compiler. Vectorization is a big topic by itself. There are many things you can do to make your code vectorize better. This page (http://software.intel.com/en-us/intel-vectorization-tools) is the ultimate guide for all you need to know about vectorization with Intel compilers. If you are in a hurry, you can start with this article: http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/

0 Kudos
Reply