- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This small code is completely ruining my whole program performance. The code generated by the compiler has a huge overhead when calling to "SP_entry_add". The problem are not the instructions inside, but the function call itself.
Making some tests the problem seems to be associated to the call "CALL ALLOC_spMatrix(spA,nircn) ". If I simply replace this call for the two calls inside ALLOC_spMatrix ( that is "CALL ALLOC_ircn(spA%irn,spA%icn,nalloc)" and "CALL moveALLOCrel_realVector(spA%M_SP,nalloc)"), the code is much faster.
Therefore my guess is that the compiler is doing a very poor job when I try to pass the derived type spA and allocate memory in a different function. I checked with a different compiler and it doesn't suffer from this drawback.
MODULE TIPOS_DERIVADOS TYPE spMatrix INTEGER::nz=0,nz_tot=0,dimRow=0,dimCol=0 INTEGER,ALLOCATABLE,DIMENSION(:)::irn,icn REAL(8),ALLOCATABLE,DIMENSION(:)::M_SP INTEGER,DIMENSION(:,:),ALLOCATABLE::pattern LOGICAL::preprocesada=.FALSE.,AvoidDoubles=.FALSE. END TYPE spMatrix CONTAINS SUBROUTINE moveALLOCrel_intVector(irc,nalloc,ival) INTEGER nircn,nalloc,ival INTEGER,ALLOCATABLE,DIMENSION(:)::irc_aux,irc INTENT(IN) nalloc INTENT(INOUT) irc OPTIONAL ival if(allocated(irc)) then nircn=size(irc) ALLOCATE(irc_aux(nircn+nalloc)) irc_aux(1:nircn)=irc IF(PRESENT(ival)) irc_aux(nircn+1:nircn+nalloc)=ival CALL MOVE_ALLOC(irc_aux,irc) elseif(nalloc.gt.0) then ALLOCATE(irc(nalloc)) IF(PRESENT(ival)) irc=ival else ALLOCATE(irc(25)) ! Tamaño mínimo de reserva de 25 IF(PRESENT(ival)) irc=ival endif END SUBROUTINE moveALLOCrel_intVector SUBROUTINE moveALLOCrel_realVector(M_SP,nadd,val) INTEGER nM_SP,nadd REAL(8),ALLOCATABLE,DIMENSION(:)::M_SP_aux,M_SP REAL(8),INTENT(IN),OPTIONAL::val INTENT(IN) nadd INTENT(INOUT) M_SP if(allocated(M_SP)) then nM_SP=size(M_SP) ALLOCATE(M_SP_aux(nM_SP+nadd)) M_SP_aux(1:nM_SP)=M_SP IF(PRESENT(val)) M_SP_aux(nM_SP+1:nM_SP+nadd)=val CALL MOVE_ALLOC(M_SP_aux,M_SP) elseif(nadd.gt.0) then ALLOCATE(M_SP(nadd)) IF(PRESENT(val)) M_SP=val else ALLOCATE(M_SP(25)) ! Tamaño mínimo de reserva de 25 IF(PRESENT(val)) M_SP=val endif END SUBROUTINE moveALLOCrel_realVector END MODULE TIPOS_DERIVADOS MODULE sparse USE TIPOS_DERIVADOS CONTAINS SUBROUTINE SP_entry_add(spA,i,j,M_val,nz) TYPE(spMatrix)::spA REAL(8),INTENT(IN)::M_val INTEGER,INTENT(IN)::i,j INTEGER,INTENT(OUT),OPTIONAL::nz INTEGER nircn spA%nz=spA%nz+1 if(PRESENT(nz)) nz=spA%nz IF(.NOT.spA%preprocesada) THEN spA%nz_tot=spA%nz if(ALLOCATED(spA%irn)) then nircn=size(spA%irn) IF (spA%nz.gt.nircn) CALL ALLOC_spMatrix(spA,nircn) else CALL ALLOC_spMatrix(spA,100) endif spA%irn(spA%nz)=i spA%icn(spA%nz)=j spA%dimRow = max(spA%dimRow, i) spA%dimCol = max(spA%dimCol, j) ELSEIF(spA%irn(spA%nz).ne.i.OR.spA%icn(spA%nz).ne.j) THEN print *, 'sparse::SP_entry_add: ERROR DE ENSAMBLAJE, LOS INDICES DE ENSAMBLAJE NO COINCIDEN CON EL PREPROCESO. nz=', spA%nz STOP -1 ENDIF spA%M_SP(spA%nz)=M_val END SUBROUTINE SP_entry_add SUBROUTINE ALLOC_spMatrix(spA,nalloc) CLASS(spMatrix),INTENT(INOUT)::spA INTEGER nircn,nalloc INTEGER,ALLOCATABLE,DIMENSION(:)::irn_aux,icn_aux INTENT(IN) nalloc CALL ALLOC_ircn(spA%irn,spA%icn,nalloc) CALL moveALLOCrel_realVector(spA%M_SP,nalloc) END SUBROUTINE ALLOC_spMatrix SUBROUTINE ALLOC_ircn(irow,icol,nalloc) INTEGER nircn,nalloc INTEGER,ALLOCATABLE,DIMENSION(:)::irn_aux,icn_aux,irow,icol INTENT(IN) nalloc INTENT(INOUT) irow,icol CALL moveALLOCrel_intVector(irow,nalloc) CALL moveALLOCrel_intVector(icol,nalloc) END SUBROUTINE ALLOC_ircn END MODULE sparse PROGRAM main USE sparse IMPLICIT NONE TYPE(spMatrix) spA CALL SP_entry_ADD(spA,1,1,1.d0) END PROGRAM main
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you pass a TYPE(something) value to a dummy argument that is CLASS(something), the compiler has to build a large data structure pointing to the CLASS definition. This is a lot of code. There are probably other ways of handling this, but that's what the Intel compiler does.
I'm curious as to why you used CLASS here, since the type is not extended. If you change CLASS to TYPE it should go a lot faster. I'd guess that the other compiler does a better job of recognizing the case.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you pass a TYPE(something) value to a dummy argument that is CLASS(something), the compiler has to build a large data structure pointing to the CLASS definition. This is a lot of code. There are probably other ways of handling this, but that's what the Intel compiler does.
I'm curious as to why you used CLASS here, since the type is not extended. If you change CLASS to TYPE it should go a lot faster. I'd guess that the other compiler does a better job of recognizing the case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much Steve for pointing this out. This CLASS dummy argument was completely unnoticed in my tests.
SPmatrix is a type extended by other types. I guess that's why I defined it as a CLASS dummy argument, nevertheless for the use given in this small code is not needed and I probably can change it easily also in my library.
I will give it a try and let you know about the outcome.
Daniel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looking at the dissasembly, that ugly code present with the CLASS dummy argument dissapeared and this is noticeable looking at the performance too. so thank you vey much, Steve.
My comment now is: I suppose that Intel is aware of this "problem" and it doesn't need for a better implementation.
Daniel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is a compiler option that may help you find those instances where the temporary storage is created.
/check:arg_temp_created -check arg_temp_created Enables run-time checking on whether actual arguments are copied into temporary storage before routine calls. If a copy is made at run-time, an informative message is displayed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This isn't an argument temp, Barbara. It's the data structure the compiler uses for CLASS objects. An AWFUL lot of code is generated for each one, I have to think that there is a better way of handling this, maybe with some compile-time template that gets pointed to so only the parts unknown until runtime need to be filled in. An optimization could also be made for calls where the dummy is not CLASS(*). I know the compiler team has a lot on its plate, but this is going to be a sore point as more and more users embrace polymorphism.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Steve and Barbara for your feedback.
Steve, I don't know what is the solution, but you are right that there is a better way of handling this. Other people in my team use gfortran, we compared the dissasembly and it seems that gfortran has some kind of optimization for this case, because the code is much smaller for the CLASS case.
Daniel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Btw, if you think that this issue deserves a ticket, I can put it, for a better report of it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Daniel,
I may have already reported something similar to Intel Support (#04115658). In my case, the code creating the class descriptor is resulting in a benign data race in an OpenMP loop, which is adversely affecting performance. I'd say the more people that report this, the more likely it will be addressed.
Best regards,
Mark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Mark.
I think you are right. One of the reasons why we program in Fortran is because we assume that our code is going to be always optimal from the efficiency point of view and we should be freed of worrying about these details as much as possible. I still have to report it.
Daniel.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page