- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This small code is completely ruining my whole program performance. The code generated by the compiler has a huge overhead when calling to "SP_entry_add". The problem are not the instructions inside, but the function call itself.
Making some tests the problem seems to be associated to the call "CALL ALLOC_spMatrix(spA,nircn) ". If I simply replace this call for the two calls inside ALLOC_spMatrix ( that is "CALL ALLOC_ircn(spA%irn,spA%icn,nalloc)" and "CALL moveALLOCrel_realVector(spA%M_SP,nalloc)"), the code is much faster.
Therefore my guess is that the compiler is doing a very poor job when I try to pass the derived type spA and allocate memory in a different function. I checked with a different compiler and it doesn't suffer from this drawback.
MODULE TIPOS_DERIVADOS
TYPE spMatrix
INTEGER::nz=0,nz_tot=0,dimRow=0,dimCol=0
INTEGER,ALLOCATABLE,DIMENSION(:)::irn,icn
REAL(8),ALLOCATABLE,DIMENSION(:)::M_SP
INTEGER,DIMENSION(:,:),ALLOCATABLE::pattern
LOGICAL::preprocesada=.FALSE.,AvoidDoubles=.FALSE.
END TYPE spMatrix
CONTAINS
SUBROUTINE moveALLOCrel_intVector(irc,nalloc,ival)
INTEGER nircn,nalloc,ival
INTEGER,ALLOCATABLE,DIMENSION(:)::irc_aux,irc
INTENT(IN) nalloc
INTENT(INOUT) irc
OPTIONAL ival
if(allocated(irc)) then
nircn=size(irc)
ALLOCATE(irc_aux(nircn+nalloc))
irc_aux(1:nircn)=irc
IF(PRESENT(ival)) irc_aux(nircn+1:nircn+nalloc)=ival
CALL MOVE_ALLOC(irc_aux,irc)
elseif(nalloc.gt.0) then
ALLOCATE(irc(nalloc))
IF(PRESENT(ival)) irc=ival
else
ALLOCATE(irc(25)) ! Tamaño mínimo de reserva de 25
IF(PRESENT(ival)) irc=ival
endif
END SUBROUTINE moveALLOCrel_intVector
SUBROUTINE moveALLOCrel_realVector(M_SP,nadd,val)
INTEGER nM_SP,nadd
REAL(8),ALLOCATABLE,DIMENSION(:)::M_SP_aux,M_SP
REAL(8),INTENT(IN),OPTIONAL::val
INTENT(IN) nadd
INTENT(INOUT) M_SP
if(allocated(M_SP)) then
nM_SP=size(M_SP)
ALLOCATE(M_SP_aux(nM_SP+nadd))
M_SP_aux(1:nM_SP)=M_SP
IF(PRESENT(val)) M_SP_aux(nM_SP+1:nM_SP+nadd)=val
CALL MOVE_ALLOC(M_SP_aux,M_SP)
elseif(nadd.gt.0) then
ALLOCATE(M_SP(nadd))
IF(PRESENT(val)) M_SP=val
else
ALLOCATE(M_SP(25)) ! Tamaño mínimo de reserva de 25
IF(PRESENT(val)) M_SP=val
endif
END SUBROUTINE moveALLOCrel_realVector
END MODULE TIPOS_DERIVADOS
MODULE sparse
USE TIPOS_DERIVADOS
CONTAINS
SUBROUTINE SP_entry_add(spA,i,j,M_val,nz)
TYPE(spMatrix)::spA
REAL(8),INTENT(IN)::M_val
INTEGER,INTENT(IN)::i,j
INTEGER,INTENT(OUT),OPTIONAL::nz
INTEGER nircn
spA%nz=spA%nz+1
if(PRESENT(nz)) nz=spA%nz
IF(.NOT.spA%preprocesada) THEN
spA%nz_tot=spA%nz
if(ALLOCATED(spA%irn)) then
nircn=size(spA%irn)
IF (spA%nz.gt.nircn) CALL ALLOC_spMatrix(spA,nircn)
else
CALL ALLOC_spMatrix(spA,100)
endif
spA%irn(spA%nz)=i
spA%icn(spA%nz)=j
spA%dimRow = max(spA%dimRow, i)
spA%dimCol = max(spA%dimCol, j)
ELSEIF(spA%irn(spA%nz).ne.i.OR.spA%icn(spA%nz).ne.j) THEN
print *, 'sparse::SP_entry_add: ERROR DE ENSAMBLAJE, LOS INDICES DE ENSAMBLAJE NO COINCIDEN CON EL PREPROCESO. nz=', spA%nz
STOP -1
ENDIF
spA%M_SP(spA%nz)=M_val
END SUBROUTINE SP_entry_add
SUBROUTINE ALLOC_spMatrix(spA,nalloc)
CLASS(spMatrix),INTENT(INOUT)::spA
INTEGER nircn,nalloc
INTEGER,ALLOCATABLE,DIMENSION(:)::irn_aux,icn_aux
INTENT(IN) nalloc
CALL ALLOC_ircn(spA%irn,spA%icn,nalloc)
CALL moveALLOCrel_realVector(spA%M_SP,nalloc)
END SUBROUTINE ALLOC_spMatrix
SUBROUTINE ALLOC_ircn(irow,icol,nalloc)
INTEGER nircn,nalloc
INTEGER,ALLOCATABLE,DIMENSION(:)::irn_aux,icn_aux,irow,icol
INTENT(IN) nalloc
INTENT(INOUT) irow,icol
CALL moveALLOCrel_intVector(irow,nalloc)
CALL moveALLOCrel_intVector(icol,nalloc)
END SUBROUTINE ALLOC_ircn
END MODULE sparse
PROGRAM main
USE sparse
IMPLICIT NONE
TYPE(spMatrix) spA
CALL SP_entry_ADD(spA,1,1,1.d0)
END PROGRAM main
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you pass a TYPE(something) value to a dummy argument that is CLASS(something), the compiler has to build a large data structure pointing to the CLASS definition. This is a lot of code. There are probably other ways of handling this, but that's what the Intel compiler does.
I'm curious as to why you used CLASS here, since the type is not extended. If you change CLASS to TYPE it should go a lot faster. I'd guess that the other compiler does a better job of recognizing the case.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you pass a TYPE(something) value to a dummy argument that is CLASS(something), the compiler has to build a large data structure pointing to the CLASS definition. This is a lot of code. There are probably other ways of handling this, but that's what the Intel compiler does.
I'm curious as to why you used CLASS here, since the type is not extended. If you change CLASS to TYPE it should go a lot faster. I'd guess that the other compiler does a better job of recognizing the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much Steve for pointing this out. This CLASS dummy argument was completely unnoticed in my tests.
SPmatrix is a type extended by other types. I guess that's why I defined it as a CLASS dummy argument, nevertheless for the use given in this small code is not needed and I probably can change it easily also in my library.
I will give it a try and let you know about the outcome.
Daniel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looking at the dissasembly, that ugly code present with the CLASS dummy argument dissapeared and this is noticeable looking at the performance too. so thank you vey much, Steve.
My comment now is: I suppose that Intel is aware of this "problem" and it doesn't need for a better implementation.
Daniel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a compiler option that may help you find those instances where the temporary storage is created.
/check:arg_temp_created -check arg_temp_created Enables run-time checking on whether actual arguments are copied into temporary storage before routine calls. If a copy is made at run-time, an informative message is displayed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This isn't an argument temp, Barbara. It's the data structure the compiler uses for CLASS objects. An AWFUL lot of code is generated for each one, I have to think that there is a better way of handling this, maybe with some compile-time template that gets pointed to so only the parts unknown until runtime need to be filled in. An optimization could also be made for calls where the dummy is not CLASS(*). I know the compiler team has a lot on its plate, but this is going to be a sore point as more and more users embrace polymorphism.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Steve and Barbara for your feedback.
Steve, I don't know what is the solution, but you are right that there is a better way of handling this. Other people in my team use gfortran, we compared the dissasembly and it seems that gfortran has some kind of optimization for this case, because the code is much smaller for the CLASS case.
Daniel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Btw, if you think that this issue deserves a ticket, I can put it, for a better report of it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
I may have already reported something similar to Intel Support (#04115658). In my case, the code creating the class descriptor is resulting in a benign data race in an OpenMP loop, which is adversely affecting performance. I'd say the more people that report this, the more likely it will be addressed.
Best regards,
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Mark.
I think you are right. One of the reasons why we program in Fortran is because we assume that our code is going to be always optimal from the efficiency point of view and we should be freed of worrying about these details as much as possible. I still have to report it.
Daniel.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page