- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Summing two arrays and assigning the sum to a third array seems to trigger the creation of a temporary in ifort 10.1:
var1%arr = var2%arr + var3%arr
All three variables are of the same type and arr is a big allocatable array.
I found this problem because this line produces a stack overflow unless a really big stack size is set. The problem seemingly goes away when using the compiler switch "-heap-arrays", but by taking a look at the memory consumption I can see that still an unneeded temporary is created (but now it is allocated on the heap, so no stack overflow is triggered).
The problem exists in both the linux and windows version of the compiler; it seems to get worse when using OpenMP, probably because each thread wants its own temporaries.
Is there a way to tell the compiler not to create the temporary? This should also be much faster because no result has to be copied from the temporary to the destination variable.
var1%arr = var2%arr + var3%arr
All three variables are of the same type and arr is a big allocatable array.
I found this problem because this line produces a stack overflow unless a really big stack size is set. The problem seemingly goes away when using the compiler switch "-heap-arrays", but by taking a look at the memory consumption I can see that still an unneeded temporary is created (but now it is allocated on the heap, so no stack overflow is triggered).
The problem exists in both the linux and windows version of the compiler; it seems to get worse when using OpenMP, probably because each thread wants its own temporaries.
Is there a way to tell the compiler not to create the temporary? This should also be much faster because no result has to be copied from the temporary to the destination variable.
- Marcas:
- Intel® Fortran Compiler
1 Solução
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
When you have an assignment of the form A = A + B, the compiler needs to know if there is overlap between the left and right sides, and if so, to what extent. The code in the compiler to do this takes many cases into account, but one it does not look at right now is if the operands are array components of a derived type. When it sees these, it just shrugs its figurative shoulders and says "I don't know", and generates conservative code.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
Link copiado
19 Respostas
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I can't find a compiler option that will eliminate the temp in this case. I'll submit it as a possible optimization improvement to the developers.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
As a work-arround add conditional code to compile to call a subroutine to perform the summation (or use DO loop)
subroutine arraySum(inA, inB, outC)
real(8), intent(in):: inA(:,:),inB(:,)
real(8), intent(out) :: outC(:,:)
outC = inA + inB
end subroutine arraySum
use interface in calling places
you might get better execution speed if you pass in extents of arrays
The DO loop thing might be easire if you only have a few instances of this happening
With conditional code you can then quickly test new revisions of the compiler with a flip of a define.
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thank you for your answers. As you suggested, I have rewritten the array assignment using DO-loops. This workaround fixes the problem (no stack overflow even without /heap-arrays) and almost exactly doubles the speed of this section of my code. This is probably because memory bandwith is the limiting factor here.
As the impact is so big, I would very much like to see this being optimized in a newer version of the compiler (ifort 11.0 apparently behaves the same as the older version (10.1)).
However, I did not quite understand why using a subroutine alone would change anything; in fact, the summation is already done in a subroutine which does nothing else but summing the different sub-fields of a derived type. If there indeed is a way of not having to explicitly type the DO-loops, I would be very happy to learn about it, as quite a few of the arrays in question have six dimensions and typing the DO-loops for every one of them is rather tedious. Is there perhaps a way to treat a multi-dimensional array as one-dimensional? (RESHAPE is not a solution, as I need this not only for input but also for output.)
As the impact is so big, I would very much like to see this being optimized in a newer version of the compiler (ifort 11.0 apparently behaves the same as the older version (10.1)).
However, I did not quite understand why using a subroutine alone would change anything; in fact, the summation is already done in a subroutine which does nothing else but summing the different sub-fields of a derived type. If there indeed is a way of not having to explicitly type the DO-loops, I would be very happy to learn about it, as quite a few of the arrays in question have six dimensions and typing the DO-loops for every one of them is rather tedious. Is there perhaps a way to treat a multi-dimensional array as one-dimensional? (RESHAPE is not a solution, as I need this not only for input but also for output.)
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
When you have an assignment of the form A = A + B, the compiler needs to know if there is overlap between the left and right sides, and if so, to what extent. The code in the compiler to do this takes many cases into account, but one it does not look at right now is if the operands are array components of a derived type. When it sees these, it just shrugs its figurative shoulders and says "I don't know", and generates conservative code.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
Elimination of unnecessary temps is an ongoing task and we'll take another look at this area to see what we can improve.
The suggestion given in this thread removes the "derived type component" aspect from the overlap detection equation.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Steve,
I would think if the derived type were (contained)arrays as opposed to pointer to array, that it would be safe to say "cann't possibly overlap).
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Well, no. Even if they were non-pointer/allocatable arrays, the semantics of A=A+B require that the right side be completely evaluated before the left is assigned to. To avoid creating a temp, you have to make sure that either the references to A on the right overlap the left exactly or not at all.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Quoting - Steve Lionel (Intel)
Well, no. Even if they were non-pointer/allocatable arrays, the semantics of A=A+B require that the right side be completely evaluated before the left is assigned to. To avoid creating a temp, you have to make sure that either the references to A on the right overlap the left exactly or not at all.
Steve, I think the OP refered to A = B+C, not A = A+B. I use a lot of (large) allocatable arrays inside structures, and I am interested in this optimization of the compiler as well.
Olivier
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Oops! You're right. Unfortunately, that doesn't help here - the compiler does not even try to figure this out. I have asked that this be improved.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Oliver,
When I had problems with earlier versions of IVF and while waiting for a fix I would "engineer" a solution using the Fortran Preprocessor. In a FPP header file (brought in with a #include "...")
#ifdef _BROKEN_ArraySum_
#define ArraySum(A,B,C) call doArraySum(A,B,C)
#else
#define ArraySum(A,B,C) A = B + C
#endif
Then use the macro in the 100's of places where it mattered. When tentative fix came in the comment out the #define _BROKEN_ArraySum_ and compile then run an integrity/performance test. If the test failed, remove the comment and compile again. FPP makes for easy work.
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
So if I'm reading this thread correct, a temp variable is always created when assigning one array to another and slower than using do loops like, st1%array=st2%array, is this also true if the bounds are included, st1%array(1:10000)=st2%array(1:10000), thanks
Jeremy
Jeremy
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I would not say "always". In the case of allocatable or pointer array components of derived type, a temporary may be made. In normal array assignments, no. Specifying the bounds makes it even harder for the compiler - don't do that if you're copying the whole array. It can also change the semantics of the program, especially for allocatable arrays.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Quoting - Steve Lionel (Intel)
I would not say "always". In the case of allocatable or pointer array components of derived type, a temporary may be made. In normal array assignments, no. Specifying the bounds makes it even harder for the compiler - don't do that if you're copying the whole array. It can also change the semantics of the program, especially for allocatable arrays.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
See Doctor, It Hurts When I Do This for details on the semantic differences. If you want to use DO loops to do the assignment, I can understand that.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Very interesting article, i never thought about pointing it to an array function. Is there any downside to just calling -heap arrays, and allocating everything on the heap instead of the stack. This seems to get rid of all the stack overflow exceptions.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
In my view, there is no real downside to heap-arrays. If the arrays are small and the routine does little work, the overhead of allocation and deallocation may be significant, but for most applications it will be noise.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Well, if the array is large it can have quite an impact because the temporary has to be copied back to the target array. In my experience using DO-loops almost doubles the speed of the assignment operation (memory bandwith being the limiting factor). This is irrelevant, of course, if the assignment operation takes a relatively small amount of time and memory footprint is not a problem.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I was not discussing overhead of copying - that is always extra - but rather having array temporaries, when they are created, allocated dynamically vs. on the stack.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
This was fixed in version 12
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thanks! This eliminates quite a lot of DO-loops :)
Responder
Opções do tópico
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora