- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, can I utilize MKL (or any other library) for the following operations?
[cpp]
for (int i=0; i<N; i++)
y = x[map]
[/cpp]
or
[cpp]
for (int i=0; i<N; i++) x[map] += y
[/cpp]
The second operation looks impossible to parallelize using SSE or OpenMP, does it?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks!
Zhang Z (Intel) wrote:
There are two BLAS level-1 functions that are exactly for these purposes: GTHR and ROTI. Please look here and here. Both are vectorized and parallelized in MKL.
Unfortunately, roti doesn't work in my case because it requires that the values in indx must be distinct. In my case, the first operation is an expansion while the second one is a contraction. My indx has many duplicated values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks.
Unfortunately, ROTI doesn't apply in my case because it requires that indx has unique values. My indx has many duplicated values. Basically, the first operation is an expansion so that BLAS's gemm can be called. The second operation is the reduction.
It the second operation is not vectorizable, will I be able to utilize MKL if I change the second operation to the following?
[csharp]
for (int i=0; i<M; i++) {
int[] indices = map; // map is int[][]
for (int j=0; j<indices.Length; i++)
x += y[indices
}
[/csharp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As long as you have repeated values in map[], vectorization or parallelization introduces indeterminacy on which of the repeated values takes final effect. If you don't care which of those takes effect, promoting parallelization by the MKL function or by assertions such as #pragma ivdep in your code could be acceptable. The resulting race conditions could restrict the performance gain if there are enough of them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can I parallelize with MKL for the modified update algorithm?
[csharp]
for (int i=0; i<M; i++) {
int[] indices = map; // map is int[][]
for (int j=0; j<indices.Length; i++)
x += y[indices
}
[/csharp]
TimP (Intel) wrote:
As long as you have repeated values in map[], vectorization or parallelization introduces indeterminacy on which of the repeated values takes final effect. If you don't care which of those takes effect, promoting parallelization by the MKL function or by assertions such as #pragma ivdep in your code could be acceptable. The resulting race conditions could restrict the performance gain if there are enough of them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There isn't an MKL function for this (when there are duplicate values in the index vector). But you can use the Intel compiler to vectorize/parallelize your own implementation. For example, you can parallelize the outer loop with OpenMP parallel for, and vectorize the inner loop with #pragma ivdep and other vectorization pragmas. You can check whether your code is successfully vectorized or not by using the "-vector-report" option of Intel compiler. Vectorization is a big topic by itself. There are many things you can do to make your code vectorize better. This page (http://software.intel.com/en-us/intel-vectorization-tools) is the ultimate guide for all you need to know about vectorization with Intel compilers. If you are in a hurry, you can start with this article: http://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers/
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page