I have a fortran code , which I compile using the intel fortran compiler v2009. I use a machine which has 24 Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz processors. It can use the avx instruction set.
The code I have has lot of scope for vectorisation as there are number of matrix multiplications etc to be performed. I use "ifort -xAVX" instead of "ifort" for compiling and then execute the binary with my scenario data. I am surprised to see that the time is higher with avx.
I had hoped that by using vectorisation , I will get better performance results.
Also , I try to align with 8 bytes , but it is still not giving any useful result for me.
I will be thankful for any suggestion I can use the processors to max and optimise my code.
I'm not sure what you mean by v2009, but AVX is beneficial only if there is a lot of AVX work to be done together, as, at least on some processors, there can be a delay in executing the first AVX instruction in a sequence. I think more recent compilers take this into better account. As with all of the processor-specific optimization, more is not always better. You have to do your own analysis to see what works best for your application.