Have you checked that your code was really vectorized? ( '-vec-report' option)
Alsoyou could check ifthere is any difference in performance if you use '-fast' option instead of '-O2'.
omp_set_dynamic(0); earlier in the program.I am using ippMalloc to generate aligned data arrays, is there a better way, perhaps I have misunderstood, but this does not mean that accesses to an array allocated by ippMalloc are automatically parallelised? My code is using a custom sparse matrix storage format optimised for vectorization using AVX intrinsic operations.
If this is wrong then please advise.