I want to measure the performance difference between SSE2 and AVX implementations for a neural network application. I am using Intel MKL to perform the BLAS calculations so that I have the most optimized implementation.
Is there anyway to instruct the MKL to release only SSE2 code even if machine supports AVX2. I know AVX2 will mostly perform better than SSE2. But I want to quantify that difference.
Thank you. I am able to dispatch code variants with different SIMD extensions by using the following environment variables - MKL_CBWR, MKL_NUM_THREADS.