Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP Fortran Windous 10

Emil_J_
Novice
343 Views

I have Fortran code which works fine when I compile it for 32bit computer Windous 10, but it does not work when I compile it for 64-bit Windous 10 computer.  In a 64-bit compute it just stops at:   !$OMP DO SCHEDULE(STATIC,chunk)        

These are the switches I use:

ifort a.f90 libiomp5md.lib /heap-arrays /assume:byterecl /assume:buffered_io /Qip- /Ob0 /Qopenmp /auto-scalar /exe:a.exe 

 

subroutine colsol(a,v,ColTop,ColDONE,maxa,nn,kkk,na,nn1,ierr)
!   **************************************************************
!   *   Cholesky  Factorisation
!   ***************************************************************    
      implicit none
 
      real*8 a(na),v(nn),b,c
      integer*4 maxa(nn1),nn,l,n,kk,ic,nd,ki,j,k,nn1,na,kh,kl,kn,i_cnt,i_cnt_old, klt,ku,kkk,ierr
      real*8 sum1, amaxak
      integer *4  ColTop(nn),ColDONE(nn)     !...Cholesky
      integer *4  i,TOPij, chunk
      integer *4  iperct,iperct1, maxai, maxaj
!-----------------------------------------------------------
      ierr=0
      iperct=0
      iperct1=0
      
      chunk=1
 
       !...prepare 'ColTop'   
       do i = 1, nn
           ColTop(i) = i - (maxa(i + 1) - maxa(i)) + 1
       end do  
 
       !...Columns Done
       do i = 1, nn
           ColDONE(i) = 0   !... mark all columns as not done '0'
       end do  
       
!---------------------------------------------------------------------------------       
       !...factorisation  (Skyline)
        a(1) = dSqrt(a(1))
        ColDONE(1) = 1   !...colum 1 is done
        
!$OMP PARALLEL PRIVATE (i,j,k,maxaj,maxai,sum1,amaxak,TOPij) 
!$OMP DO SCHEDULE(STATIC,chunk)        
        do j = 2, nn                   !...loop for COL from 2 to nn
           maxaj=maxa(j) + j 
           do i = ColTop(j), j - 1     !...loop for ROW from top going down to diagonal

                !...wait intill Colum 'i' is done    
                do while(ColDONE(i) .ne. 1)
                end do
                
                sum1 = 0.0d0
                TOPij = Max(ColTop(i), ColTop(j))     !...find min column height for dot product
 
                maxai=maxa(i) + i 
                do k = TOPij, i - 1
                    sum1 = sum1 + a(maxai- k) * a(maxaj - k)
                end do
                a(maxaj - i) = (a(maxaj - i) - sum1) / a(maxai - i)
                
            end do
 
            !...do diagonal term J separatelly
            sum1 = 0.0d0
            do k = ColTop(j), j-1
                amaxak=a(maxaj - k)
                sum1 = sum1 + amaxak * amaxak
            end do
            
            a(maxa(j)) = dSqrt(a(maxa(j)) - sum1)
            ColDONE(j) = 1    !...colum 'j' is done
                       
        end do
!$OMP END DO
!$OMP END PARALLEL       
      
      return
      end    
    
    
  

 

 

0 Kudos
2 Replies
John_Campbell
New Contributor II
343 Views

The problem could be either with the compiler options you are now using or changes to the compiler's approach to optimisation for these options.

ifort could be modifying the DO WHILE loop, as there is nothing "changing" in the loop.

You may be better of selecting a lower optimisation and replacing the inner loops with dot_product or an optimised vector routine.

This is an interesting approach to COLSOL / omp. Why use Cholesky, as "a(maxa(j)) = dSqrt(a(maxa(j)) - sum1)" requires a positive definite matrix, while other COLSOL approaches do not ? I would be interested to know the history of this routine, as it has a backwards storage order for A.

John
 

0 Kudos
John_Campbell
New Contributor II
343 Views

I adapted your approach to a COLSOL - Crout solver and found a problem with your DO WHILE loop being optimised away. I did get it to work with a more complex wait loop and included a timer call for the first wait cycle:

      DO Jeq = JB,JT
!
!       Wait until this column is complete
         iw = 0        
         DO
           if ( NA_done(JEQ) ) exit
           call small_delay (iw)
           iw = iw + 1
         END DO
...
  subroutine small_delay (iw)
      integer*4 :: iw
      integer*8 :: tick
      integer*8 QueryPerformance_tick
      external  QueryPerformance_tick
!
      if ( iw==0 ) tick = QueryPerformance_tick ()
  end subroutine small_delay 

Your OMP solver approach works well for small problems but becomes constrained by a cache - memory bottleneck for larger problems.

0 Kudos
Reply