I have a finite difference code for wave propagation, because there is a lot of temporary mixed derivative term, I defined a temporary memory buffer and separate them into chunks to store each derivative term for memory efficiency. The code looks like
Wrk = malloc(2*(4*nxe*(2*ne+1) + 15*nxe)*sizeof(float));
computing function:
float *dudz = Wrk + NE;
float *dqdz = dudz + nxe;
for (int i=ix0_1; i<ixh_1; i++)
dudz = hdzi*(u[i+nxe]-u[i-nxe]);
The problem for me, is that the code runs fine with Intel compiler 12, however it will blow up when compiling it with intel compiler 13 and 14. All the compiling from intel compiler 12, 13 and 14 will optimize the code above by vectorizing the loops. If I turn off the compiler optimization for intel compiler 13 and 14, by defining
volatile float *dudz = Wrk + NE;
The code will also run fine although slower.
I would greatly appreciate if any of you could give me some advice, Thank you so much,