Greetings from LANL!
I have a code dominated by 3 intrinsics: TRANSPOSE, MAXVAL, and NORM2. the arguments are large arrays/vectors. I have 36 or more cores at our disposal.
First, am I correct in assuming these are potentially vectorized but not threaded by default?
I am considering writing my own replacements for these with nested loops and applying appropriate OMP PARALLEL and OMP SIMD directives. However it would be nice to find threaded versions of these. In MKL maybe?
You are correct. Ask in the MKL forum if MKL has threaded equivalents. There may indeed be one for TRANSPOSE, but I am less certain of the others.