- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
'dot_product' etc. ?
- Etiquetas:
- Intel® Fortran Compiler
Enlace copiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I am trying to parallelize some simple Fortran-90 array-syntax code, for example, adding two arrays:
A(:,:) = B(:,:) + C(:,:)
by placing OpenMP directives around this code as:
!$OMP PARALLEL
!$OMP WORKSHARE
A(:,:) = B(:,:) + C(:,:)
!$OMP END WORKSHARE
!$OMP END PARALLEL
I compile and test my code 'test.f' on an SGI-altix platformusing the most recent ifort compiler (~ version 9, release 13). My compile statement is 'ifort -O3 -openmp test.f'.
When I run my test program I get no speed-up-benefit as I increase the number of processors. I change the number of processors via the 'setenv OMP_NUM_THREADS ...' command.
However, if I write my code in do-loop style I do get speed up. For example,
!$OMP PARALLEL
!$OMP DO PRIVATE (A,B,C)
DO I=1,N
DO J=1,N
A(I,J) = B(I,J) + C(I,J)
ENDDO
ENDDO
!$OMP END DO
!$OMP END PARALLEL
I would appreciate any help you or insight you can offer (since all of my code is written in array-syntax form and non in do-loop style).
Cheers, David
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Did the compiler successfully interchange loops in order to optimize your DO loop version with load-pair, as you would like it to do even when you don't apply OpenMP (assuming the I loop is fairly long)? Does it produce load-pair code for the rank 2 array version?
OpenMP parallelization may be pointless, if the loop isn't optimized prior to parallelization. Unless your point is to minimize serial performance in order to improve parallel scaling. If you don't nest these loops properly, in addition to cutting inner loop performance, your OpenMP parallelization could peak early, if you reach the point of false sharing, due to multiple threads operating frequently on the same cache line.
I'll agree that I'd like to see full optimization of array syntax, but OpenMP is somewhat of a low level programming scheme, which doesn't do well when details are left to the intelligence of the compiler, scheduler, and run-time library.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Certain OpenMP* WORKSHARE constructs now parallelize with Intel® Fortran Compiler 15.0. Our implementation is described here.
Patrick
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
It looks easier to remember to use f77 code than to figure out how to work with these restrictions.

- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla