- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an issue understanding automatic arrays, especially where in between stack or heap it is stored.
I have the following main program.
program myprog use mymod implicit none integer, parameter :: N = 477, M = 4 complex*16, parameter :: imnum = dcmplx(1.0d0,1.0d0) complex*16 :: A(N,N,M), B(N,N,M-1), C(N,N,2:M), X(M*N), Y(M*N) integer :: i, j, k do k = 1,M do i = 1,N do j = 1,N A(i,j,k) = i*j/dble(N) if (k < M) then B(i,j,k) = 1.0d0 + imnum/(i+j) C(i,j,k+1) = 1.0d0 - imnum/(i+j) end if end do end do end do do i = 1,M*N X(i) = dble(i) end do call mysub('G', N, M, A, B, C, X, Y) end program
Subroutine mysub is located inside module mymod.f90.
! ============================================================= subroutine mysub(ulh, N, M, diag, udiag, ldiag, vinp, vout) implicit none integer :: i, il, ic, ir character, intent(in) :: ulh*1 integer, intent(in) :: N, M complex*16, intent(in) :: diag(:,:,:), udiag(:,:,:), vinp(:) complex*16, intent(in), optional :: ldiag(:,:,2:) complex*16, intent(out) :: vout(M*N) complex*16, allocatable :: ldiag1(:,:,:) complex*16 :: v_l(size(diag,1)), v_c(size(diag,1)), v_r(size(diag,1)) allocate(ldiag1(N,N,2:M)) if (ulh == "G") then if (present(ldiag)) then ldiag1 = ldiag else stop "In subroutine tribl_matmul, argument ldiag is required when argument ulh = 'G'." end if end if !allocate(v_l(N), v_c(N), v_r(N)) !call omp_set_num_threads(2) !!$OMP PARALLEL DO PRIVATE(ic, il, ir, v_l, v_r, v_c) do i = 1,M ic = (i-1)*N+1 il = (i-1-1)*N+1 ir = (i+1-1)*N+1 if (i == 1) then v_l = dcmplx(0.0d0,0.0d0) call zgemv('n', N, N, 1.0d0, udiag(:,:,i), N, vinp(ir:ir+N-1), 1, 0.0d0, v_r, 1) else if (i == M) then call zgemv('n', N, N, 1.0d0, ldiag1(:,:,i), N, vinp(il:il+N-1), 1, 0.0d0, v_l, 1) v_r = dcmplx(0.0d0,0.0d0) else call zgemv('n', N, N, 1.0d0, ldiag1(:,:,i), N, vinp(il:il+N-1), 1, 0.0d0, v_l, 1) call zgemv('n', N, N, 1.0d0, udiag(:,:,i), N, vinp(ir:ir+N-1), 1, 0.0d0, v_r, 1) end if call zgemv('n', N, N, 1.0d0, diag(:,:,i), N, vinp(ic:ic+N-1), 1, 0.0d0, v_c, 1) write(*,"(a25,i3,3(f20.3,2x))") "problem, v_l, v_c, v_r", i, abs(sum(v_l)), abs(sum(v_c)), abs(sum(v_r)) vout(ic:ic+N-1) = v_l + v_c + v_r end do !!$OMP END PARALLEL DO !deallocate(v_l, v_c, v_r) deallocate(ldiag1) end subroutine
I compiled and ran it using these commands
#!/bin/sh ulimit -s unlimited longflag="-L/opt/intel/composerxe-2011/mkl/10.2.3.029/lib/em64t -lmkl_solver_ilp64_sequential -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm" ifort -openmp -check arg_temp_created -i8 -c ./mymod.f90 ${longflag} ifort -openmp -check arg_temp_created -i8 ./myprog.f90 mymod.o ${longflag} ./a.out
The calculation was erroneous as it prints out
problem, v_l, v_c, v_r 1 0.000 8673538245.000 NaN problem, v_l, v_c, v_r 2 54265785.750 23341568032.702 NaN problem, v_l, v_c, v_r 3 171280368.928 33841597383.353 NaN problem, v_l, v_c, v_r 4 270608107.622 36760173582.811 0.000
Then if I add -heap-arrays flag in the compilation of mymod.f90, the problem is remedied and those NaN values become just valid numbers. This means without heap-arrays all automatic arrays (no temporary arrays were created) are most likely located in stack. But I have removed the stack limit so I expected the execution should be fine without heap-arrays flag. So, where are automatic arrays located actually?
As for which arrays are automatic inside mysub, they must be v_l, v_c, and v_r. This is because when I declared them as allocatable and remove the comment marks in the allocatable and deallocatable statements of these arrays, without heap-arrays flag, the calculation didn't yield NaN. The reason I prefer these three arrays to be automatic rather than allocatable is that when I went with the latter and run the program with those openmp directives activated, the program did not print anything, not even error or warning flags, which I interpreted as an error anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It turns out that there is a rather elusive bug in your calls to ZGEMV, and it took me a while to realize its presence. Among the arguments are the scalar multipliers alpha and beta, which must be complex*16. You are only passing 1.0d0 and 0.0d0, which are of the incorrect type. The BLAS routine then picks up a junk value from the next eight bytes for the absent imaginary part.
Fix this by creating scalar multipliers, say, Z_ONE = cmplx(1d0,0d0,kind=kind(0d0)) and Z_Zero = cmplx(0d0,0d0,kind=kind(0d0)), and use these in the argument list.
The compiler would have caught this error if you had placed include 'mkl_blas.fi' after the implicit none line in MYSUB.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think that the bug has more to do with code generation/optimization than stack, heap, etc. Here is a greatly simplified reproducer:
module mymod contains subroutine mysub(N, A, vinp, vout) implicit none integer, parameter :: dp = kind(0.0) integer, intent(in) :: N complex, intent(in) :: A(N,N), vinp(N) complex, intent(out) :: vout(N) complex :: v_c(N) call cgemv('n', N, N, 1.0, A, N, vinp, 1, 0.0, v_c, 1) write(*,10) v_c 10 format(2(3x,'(',ES10.3,',',ES10.3,')',:)) vout(1:N) = v_c end subroutine end module program myprog use mymod implicit none integer, parameter :: N = 2 complex :: A(N,N), X(N), Y(N) integer :: i, j do i = 1,N do j = 1,N A(i,j) = i*j/real(N) end do end do do i = 1,N X(i) = i*4.0 end do call mysub(N, A, X, Y) end program
When I compile and run the program with the 18.0.2 Windows compiler with /O2 /Qmkl, I find that each run produces a different value for the imaginary part of v_c(2).
S:\LANG>efns ( 1.000E+01, 2.609E-38) ( 2.000E+01, 7.977E+17) S:\LANG>efns ( 1.000E+01, 2.609E-38) ( 2.000E+01, 1.030E+15) S:\LANG>efns ( 1.000E+01, 2.609E-38) ( 2.000E+01,-1.626E-18) S:\LANG>efns ( 1.000E+01, 2.609E-38) ( 2.000E+01, 7.715E-07)
Please try this code with your version of the compiler. I find that with the 17.0.4 version and the same compiler options the bug does not occur.
P.S. The real bug, discovered a day later, is the use of real constants 1.0 and 0.0 instead of complex constants with those values, on Line 11.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, thanks for replying.
My compiler produces
( 1.000E+01, 0.000E+00) ( 2.000E+01, 0.000E+00) ( 1.000E+01, 0.000E+00) ( 2.000E+01, 0.000E+00) ( 1.000E+01, 0.000E+00) ( 2.000E+01, 0.000E+00) ( 1.000E+01, 0.000E+00) ( 2.000E+01, 0.000E+00)
It was compiled with ifort -O2 -mkl test_autoarr.f90. The result of ifort -V is
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.2.137 Build 20110112 Copyright (C) 1985-2011 Intel Corporation. All rights reserved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>Then if I add -heap-arrays flag in the compilation of mymod.f90, the problem is remedied and those NaN values become just valid numbers.
Heap arrays have an implicit alignment of that of the allocation granularity, stack/automatic arrays have implicit natural boundary alignment of the cell type (real*8 in this case). Depending on the O/S the heap granularity may be 16 bytes. This may have prevented a complex number from spanning a cache line.
Try using aligned automatic arrays:
!dir$ attributes align: 16:: your, complex, arrays, here, ...
or better yet, use cache line aligned arrays:
!dir$ attributes align: 64:: your, complex, arrays, here, ...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It turns out that there is a rather elusive bug in your calls to ZGEMV, and it took me a while to realize its presence. Among the arguments are the scalar multipliers alpha and beta, which must be complex*16. You are only passing 1.0d0 and 0.0d0, which are of the incorrect type. The BLAS routine then picks up a junk value from the next eight bytes for the absent imaginary part.
Fix this by creating scalar multipliers, say, Z_ONE = cmplx(1d0,0d0,kind=kind(0d0)) and Z_Zero = cmplx(0d0,0d0,kind=kind(0d0)), and use these in the argument list.
The compiler would have caught this error if you had placed include 'mkl_blas.fi' after the implicit none line in MYSUB.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi mecej4,
yes I know that alpha and beta are complex*16 in the dummy arguments of zgemv but I passed real*8 anyway because I presumed the variable passing process behaves the same way as the usual variable assignments. For example if I declare variable X as complex*16 and then later it is assigned with value using e.g. X = 1.0d0 then the compiler by defaults will understand it as being the same as X = cmplx(1.0d0, 0.0d0). But I will try following your suggestion.
Jim,
Array alignment is a new concept to me, I have heard about this several times but never encounter a situation where I have to use it. I will try to get some information about aligning an array but I will also appreciate if you wouldn't mind telling me briefly what array alignment means.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mecej4,
I tried your suggestion and it worked! Still I am wondering though, if passing real*8 for scalars alpha and beta would prompt the compiler to pick up undefined value for the imaginary parts, then compiling with heap-arrays flag should not have solved the problem as it did in my first post because this flag has nothing to do with scalars right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
efnacy wrote:
Yes, I know that alpha and beta are complex*16 in the dummy arguments of zgemv but I passed real*8 anyway because I presumed the variable passing process behaves the same way as the usual variable assignments.
Sorry, this assumption is quite wrong. In an assignment statement V = E, both the variable V and the expression E have types that the compiler can infer from that line and any other type declarations in the subprogram containing that statement. When you invoke a subroutine or function using an implicit interface (in this case, the implicit interface was wrong), you are essentially telling the compiler, "I'm passing the correct arguments, just obey me". In effect, the compiler sees only E in your code, since V is in the source code of MKL or another BLAS library, which was compiled separately. Therefore, it has no way of detecting the mismatch in order to do the type conversion.
Fiddling with compiler options, etc., will only add to the confusion (I confess to such confusion myself!). What you see the code do will depend on the junk contents of the undefined imaginary parts. Those values may change from run to run and may cause the program to (i) crash, (ii) run and give incorrect results, or even (iii) give correct results and keep you unaware of the bug!
Please try adding the include 'mkl_blas.fi' statement as I suggested earlier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes you are right, upon including that include line in mysub code and passing real*8 values, the compilation produced errors saying that the type do not match. So, if in a code I have lapack routines then to be safe it is better to include include 'mkl_lapack.fi' in the calling unit?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, you can include mkl_lapack.fi, but there is a better way. You may have noticed that inclusion of that file makes the compilation take significantly longer, because the compiler has to read and digest the thousands of interface declarations in the file. To avoid this delay, you can do the following:
Create a new source file, say, mkli.f, containing
! module mkl_lapack include 'mkl_lapack.fi' end module
Compile this file once. This will produce the file mkl_lapack.mod for the chosen compiler configuration (IA32, LP64 or ILP64). Place the module file along the include path of the compiler or in the current working directory.
In any of your Fortran source files that call Lapack routines, add use mkl_lapack after the subprogram heading and before any declarations.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page