I want to measure the performance difference between SSE2 and AVX implementations for a neural network application. I am using Intel MKL to perform the BLAS calculations so that I have the most optimized implementation.
Is there anyway to instruct the MKL to release only SSE2 code even if machine supports AVX2. I know AVX2 will mostly perform better than SSE2. But I want to quantify that difference.