Optimiser problems

bendel_boy1 · ‎02-24-2009

I have a DLL which appears to work well when compiled as a debug version.

When I switch to a release version the DLL fails immediately with an access violation - where, I don't know, as the debugger breaks out to assembler. If I tick that full debug information be included then the DLL runs (line numbers only and I still get a failure through access violation). If I then switch from 'Maximize speed' to 'Maximize speed plush igher optimisations' I get the access violation again. If I select 'Use Intel Processor extensions' the compiler aborts - I don't know if this is triggered by the development machine using an AMD Sempron.

When I timed the two options of a debug DLL and a release DLL (with full debug information, but also set to maximize speed' the debug DLL varies between running as fast as the release version, and running faster. Timing might have been affected by other Windows processes, but the debug version was never slower than the release.

Any suggestions as to what might be possible causes? I can continue to provide the DLL compiled using the debug mode, when all seems to be fine. The DLL integrates functions in user-supplied DLLs, and the results have been checked against Matlab equivalents for three user DLLs so far. The user interface is written in Visual Basic 6, and array out of bound accesses within the DLL result in the VB code reporting either an overflow or divide by zero upon returning from the DLL - a problem earlier - which is not currently happening. I do use allocatable arrays, which I deallocate on first use, but with the STAT keyword to provide a 'safe' error return, and the SAVE attribute to ensure persistence between DLL calls.

The only nonstandard thing of which I am aware is the use of Cray pointers to access subroutines written in user-provided DLLs. I'm using IVF 9.1, if that matters.

TimP · ‎02-24-2009

We've had to drop the /Qsafe-cray-ptr option, in case you used that. It makes unnecessarily aggressive assumptions. It doesn't mean "accept Cray pointers." In fact, if your debug build gives good performance, you may never want to go to more aggressive options than /O1 /fp:source. If you are using /Qsave, instead of correcting source code, it may be dangerous to optimize. The 9.1 compiler would not use SSE code for AMD, unless you set /QxW to make a single code path. Multiple code paths may not be good in your case. Beyond this, there are too many possibilities to speculate on. You may want trial runs with /check, in case the compiler can diagnose any problems.

bendel_boy1 · ‎02-25-2009

Thank you.

I did use /Qsave, but not /Qsafe-cray-pointers (although, in practice, the Cray pointers are safe). I removed /Qsave, and put the SAVE attribute on the additional variables that expect persistence - Fortran 77-era integrators.

It didn't change the need to include full debugging information, and restrict to 'Maximize speed'.

I'll live with using a debug build, then.

bendel_boy1 · ‎02-25-2009

Compiling with /O1 the following code results in the compiler just hanging. I can't see what is here that would cause such a problem. (Insert code in this forum editor doesn't support Fortran!)

[plain]subroutine Equations(t, ny, y, dy, tStart, tFinish,                     &
                        NumberOfModels, NumberOfProcesses, ModelList,   &
                        ProcessList, NumberofStreams, Streams,          &
                        nx, x, nResults, Results, nOps, Operation,      &
                        nModelData, ModelData, ProcessOrder)
  USE CoreTypes
  USE ProcessID
  implicit none
  
  double precision     :: tStart, tFinish
  integer              :: NumberOfModels
  integer              :: NumberofProcesses
  integer              :: nx, nResults, nOps, nModelData
  integer              :: NumberofStreams
  integer              :: ProcessOrder(NumberofProcesses)
  double precision     :: x(nx), Results(nResults)
  double precision     :: Operation(nOps)
  double precision     :: ModelData(nModelData)
  type (ProcessModel)  :: ModelList(NumberOfModels)
  type (Process)       :: ProcessList(NumberOfProcesses)
  type (Stream)        :: Streams(NumberofStreams)

  integer:: ny
  double precision:: t, y(ny), dy(ny)
  
  dy = 0d0
  call Loop(t, y, ny)
  call Eqn(t, y, dy, ny)
  return
  
  entry LoopsOnly(t, ny, y, tStart, tFinish,                     &
                  NumberOfModels, NumberOfProcesses, ModelList,   &
                  ProcessList, NumberofStreams, Streams,          &
                  nx, x, nResults, Results, nOps, Operation,      &
                  nModelData, ModelData, ProcessOrder)
  call Loop(t, y, ny)
  return                        
CONTAINS

  subroutine Eqn(t, y, dy, ny)
  implicit none
  include 'interface.f90'
  ! -- Non-standard CRAY pointer
  pointer (pe, locDLLSubDY)
  integer:: ny
  double precision:: t, y(ny), dy(ny)

  integer:: i, j, k, L
  integer:: ns, iy, iys, ix, ixs
  integer:: ir, irs, io, ios, im
    do i = 1, NumberOfProcesses
      k   = ProcessOrder(i)
      j   = ProcessList(k)%ModelIndex
      if (j .gt. 0) then
        do L = 1, NumberOfModels
          if (ModelList(L)%ModelID .eq. j) then
            j = L
            exit
          end if
        end do
      end if
      NS  = ProcessList(k)%Stages
      iy  = ProcessList(k)%y
      iys = ProcessList(k)%yStage
      ix  = ProcessList(k)%x
      ixs = ProcessList(k)%xStage
      ir  = ProcessList(k)%Results
      irs = ProcessList(k)%StageResults
      io  = ProcessList(k)%Operation
      ios = ProcessList(k)%StageOperation
      im  = ProcessList(k)%ModelData
      select case (j)
      case (INFLUENT_ID)
      case (CV_ID)
      case (MIX2_ID, MIX3_ID)
      case (SPLIT2_ID, SPLIT3_ID)
      case default
        pe  = ModelList(j)%ModelPointerDiff
        call locDLLSubDY(t, NS, ProcessList(k), Streams,             &
                         y(iy), y(iys), x(ix), x(ixs),               &
                         dy(iy), dy(iys), Results(ir), Results(irs), &
                         Operation(io), Operation(ios), ModelData(im))
      end select
    end do
end subroutine Eqn

subroutine Loop(t, y, ny)
use LoopData
  include 'interface.f90'
  ! -- Non-standard CRAY pointer
  pointer (pl, locDLLSubAssign)

  integer:: ny
  double precision:: t, y(ny)
  logical         :: Converged
  double precision:: OldFlow(NumberOfStreams)
  integer         :: count

  integer:: i, j, k, L, iOut, out, in
  integer:: ns, iy, iys, ix, ixs
  integer:: ir, irs, io, ios, im

!DEC$ ATTRIBUTES ALIAS: '_NumberOfDeterminands':: NumberOfDeterminands
  integer, external:: NumberOfDeterminands
  integer          :: nDet

    nDet = NumberOfDeterminands()
    count = 0
    converged = .false.
    do
      OldFlow = Streams(:)%Flow
      do i = 1, NumberOfProcesses
        k   = ProcessOrder(i)
        j   = ProcessList(k)%ModelIndex
        if (j .gt. 0) then
          do L = 1, NumberOfModels
            if (ModelList(L)%ModelID .eq. j) then
              j = L
              exit
            end if
          end do
        end if
        NS  = ProcessList(k)%Stages
        iy  = ProcessList(k)%y
        iys = ProcessList(k)%yStage
        ix  = ProcessList(k)%x
        ixs = ProcessList(k)%xStage
        ir  = ProcessList(k)%Results
        irs = ProcessList(k)%StageResults
        io  = ProcessList(k)%Operation
        ios = ProcessList(k)%StageOperation
        im  = ProcessList(k)%ModelData
!
! Set all outlet COMPOSITIONS to INLET values ...
!
        if (ProcessList(k)%InStream(1) .gt. 0) then
          do iOut = 1, MAX_OUTLETS
            out = ProcessList(k)%OutStream(iOut)
            if (out .gt. 0) then
              in = ProcessList(k)%InStream(1)
              Streams(out)%t             = Streams(in)%t
              Streams(out)%pH            = Streams(in)%pH
              Streams(out)%Value(1:nDet) = Streams(in)%Value(1:nDet)
            end if
          end do
        end if
        select case (j)
        case (INFLUENT_ID)
          call Influent(t, ProcessList(k), Streams, ModelData(im), Operation(io))
        case (CV_ID)
          call CV(t, ProcessList(k), Streams, ModelData(im), Operation(io))
        case (MIX2_ID)
          call Mix(ProcessList(k), Streams, 2)
        case (MIX3_ID)
          call Mix(ProcessList(k), Streams, 3)
        case (SPLIT2_ID)
          call Split(ProcessList(k), Streams, Operation(io), 2)
        case (SPLIT3_ID)
          call Split(ProcessList(k), Streams, Operation(io), 3)
        case default
          pl  = ModelList(j)%ModelPointerAlloc
          call locDLLSubAssign(t, NS, ProcessList(k), Streams,      &
                               y(iy), y(iys), x(ix), x(ixs),        &
                               Results(ir), Results(irs),           &
                               Operation(io), Operation(ios), ModelData(im))
        end select
        do iOut = 1, MAX_OUTLETS
          out = ProcessList(k)%OutStream(iOut)
          if (out .gt. 0 .and. j .ne. INFLUENT_ID) then
            if (Streams(out)%Flow .le. 0d0) then
              Streams(out)%t     = 0d0
              Streams(out)%pH    = 0d0
              Streams(out)%Value(1:nDet) = 0d0
            end if
          end if
        end do
      end do

      converged =  all(abs(OldFlow - Streams(:)%Flow) .le. fLoopTol * abs(Streams(:)%Flow))
      count = count + 1
      if (converged) exit
      if (.not. bLoop)         exit
      if (count .gt. iMaxLoop) exit
    end do
  end subroutine Loop
end subroutine Equations
[/plain]

TimP · ‎02-25-2009

I can see how this might be a compiler buster. The idea of requiring SAVE in combination with ENTRY has always been doubtful, and now you combine it with CONTAINS in a way which is likely not to have been tested during compiler QA. As you didn't supply your include file, we can't try it from your post. You could submit a problem report on premier.intel.com.

bendel_boy1 · ‎02-26-2009

Thanks.

Making the CONTAINed routines separate (allowing me to get rid of the ENTRY) results in that routine compiling with optimisations and NOT hanging.

The DLL no longer crashes with access violation if I use the /O1 setting, which is a big improvement!

And I get a 33 - 50% improvement in the runtime for my test case. (33% using a Runge-Kutta integrator; 50% using an implicit Runge-Kutta integrator.) Still falls over with /O2, despite getting rid of /Qsave - but this has been a huge improvement.

I'm curious as to what properties of a code may typically prevent /O2 working.

Steven_L_Intel1 · ‎02-26-2009

Quoting - dudley@wrcplc.co.uk

I'm curious as to what properties of a code may typically prevent /O2 working.

Typically, the code is not legal Fortran and violates assumptions the optimizer makes. For example, if you pass a COMMON variable as an actual argument and access it both as a dummy argument and as a COMMON. Or if you access an array outside the declared bounds or have type mismatches in arguments.

It could also be a compiler bug.