Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
7117 Discussions

fastest [parallel] way to initialize a column vector?

Azua_Garcia__Giovann
1,125 Views

Hello,

I could not find a BLAS-1 function that would fit this bill and was wondering whether MKL has a function implementation for this. I know stl or boost has i.e. fill but I prefer one impl. that takes into account locality and parallelizes.

I create a column vector and want to initialize it to a given double value. memset only works for zeroing out the memory but AFAIK it boils down to doing a not so efficient loop.

Below my C code (pending moving to C++), is there a way to do this in a more locality/vectorized/parallel high performance way using MKL?

Many thanks in advance,
Best regards,
Giovanni

/**
* Initialize a vector with a default value
*/
tvector* vector_init_with_default(int capacity, double value) {
tvector* result = vector_init_static(capacity);
for (int i = 0; i < capacity; ++i) {
result->data = value;
}

return result;
}

0 Kudos
1 Solution
TimP
Honored Contributor III
1,125 Views
Any auto-vectorizing compiler should generate effective code. As you don't give any hints about array size, you may get an advantage with icc by #pragma vector nontemporal if you expect the array to apan several 4KB pages.
memset should switch automatically into nontemporal stores for suitably large arrays; the inefficiency is in the time taken to determine which branch to take, which isn't significant if the array is that large. icc will substitute memset automatically if it knows value==0.

View solution in original post

0 Kudos
2 Replies
TimP
Honored Contributor III
1,126 Views
Any auto-vectorizing compiler should generate effective code. As you don't give any hints about array size, you may get an advantage with icc by #pragma vector nontemporal if you expect the array to apan several 4KB pages.
memset should switch automatically into nontemporal stores for suitably large arrays; the inefficiency is in the time taken to determine which branch to take, which isn't significant if the array is that large. icc will substitute memset automatically if it knows value==0.
0 Kudos
Gennady_F_Intel
Moderator
1,125 Views
>> is there a way to do this in a more locality/vectorized/parallel high performance way using MKL?
there are no specific mkl's functions doing this operations because of ...see Timothy's answers above.
0 Kudos
Reply