I recently updated my ifort compiler up to version 188.8.131.52 20191121 (in parallel studio 2020), and running my series of testcases I get some errors. 1. ERRORS: If I compile with the options -gen-interface -warn all -check all -traceback -fpe-all -fp-stack-check -ftrapuv -heap-arrays then I get the following message: forrtl: warning (406): fort: (33): Shape mismatch: The extent of dimension 3 of array DENSITYTEST is 1 and the corresponding extent of array is 2 and the backtrace is referencing to this block : do x= xlim_sup, xlim_inf, -1 do i=0, ilim_sup densitytest(x,:,:) = densitytest(x,:,:) +sum( f(x-v(i,1),:,:, i,:)*factor(x,i, :,:,:), dim=3) enddo enddo 2. MEMORY LEAK: Then if I do all this summation by myself, the program runs but I get memory leaking. I noticed the memory leak by watching the consumption of RAM since my computer was swapping and running slow while I was starting to run my program. Note that I get the same memory leaking if I compile without all the previous debugging option and with or without the summation by hand. (So I think that it is a different problem.) I isolate the memory leakage. It was not a hard task, since it was coming from the next block that use also the intrinsic function sum. The leaking is coming from the following line: do while ( any(densitytest(x,:,:)>densitymax) .AND. any(densitytest(x,:,:)>sum(f(x,:,:,0,:),dim=3)) ) Again here, if I do the summation by myself I get no memory leak. I also compiled my initial code with gfortran and I get no error, no memory leaks and every tests was giving the expected results. Also, with ifort 18 I get no such problems. So, is it a known buggus of the new version ? Or am I missing something ? Best.
There is another thread on here where there are errors with shape checking (this is new in the latest compiler), you can switch that option off
Not sure about the memory leak.
The problem with the code sample in #1 is the sum is using array slices that are not contiguous so I think the code will be creating temporary arrays on the stack, that can be a problem with stack overflow if they are large. That aside I am guessing there could be a leak in the code that makes the temps.
what do you mean my 'manually doing' do you mean making nested loops to do the summation element by element? mAybe pasting some code would be best.
And finally I suggest making a small self contained code example that demonstrates the leak for others to have a play with and so it can be sent to Intel support....
In your original code
do x= xlim_sup, xlim_inf, -1 do i=0, ilim_sup densitytest(x,:,:) = densitytest(x,:,:) +sum( f(x-v(i,1),:,:, i,:)*factor(x,i, :,:,:), dim=3) enddo enddo
The content of the sum is a vector expression (array product) that is producing unnecessary products as you are then only extracting the sum of the resultant array's dim3. You should consider producing the temporary array product of the cells that will produce the values to be summed.
The original code is illustrating a compiler bug (memory leak as reported).
The original code is also inefficient in two ways:
1) The array expression is generating temporaries (compiler doing this)
2) Due to your preference of index order, the array slices are not contiguous (and thus experience collection into a contiguous temporary)
The reason for the collection into a contiguous temporary is such that the CPU SIMD instructions can be exploited.
You might try the following with your post #1 code
block integer :: I1, I2, I3 real, dimension(:,:,:) allocatable :: Temp3D, Prod real, dimension(:,:) allocatable :: Temp2D I1 = size(factor, dim=3) I2 = size(factor, dim=4) I3 = size(factor, dim=5) allocate(Temp2D(I1,I2), Temp3D(I1, I2, I3), Prod(I1, I2, I3)) do x= xlim_sup, xlim_inf, -1 do i=0, ilim_sup Temp3D = f(x-v(i,1),:,:, i,:) ! copy slice into contiguous memory Prod = factor(x,i, :,:,:) ! copy slice into contiguous memory Prod = Prod * Temp3D Temp2D = sum( Prod, dim=3) densitytest(x,:,:) = densitytest(x,:,:) + Temp2D enddo enddo end block
I haven't tested the above as to if it eliminates all expression temporaries.
Note, if you swap the x index of densitytest to last index then the last statement of the inner loop is vectorizable.