Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Germán
New Contributor I
614 Views

stack overflow and can't even tell where the problem may be


I am porting a Fortran program from Linux/CentOS-7 to Windows-10

On Linux with Intel 2019, it compiles and runs just fine.

On Windows with Intel oneAPI 2021.3, it compiles, but immediately reports stack overflow as it attempts to run.

Can't even tell where the problem is.

 

> program.exe input.txt
forrtl: severe (170): Program Exception - stack overflow
Image PC Routine Line Source
program.exe 00007FF62757ACD8 Unknown Unknown Unknown
program.exe 00007FF6272953F3 MAIN__ 17 program.F
program.exe 00007FF62757A4BE Unknown Unknown Unknown
program.exe 00007FF62757AEE4 Unknown Unknown Unknown
KERNEL32.DLL 00007FFC87CB7034 Unknown Unknown Unknown
ntdll.dll 00007FFC88102651 Unknown Unknown Unknown

 

Compilation options:
-fpp /convert:big_endian /check:bounds /warn:noalignments -traceback /fpe:0 /Qinit:zero

Added /heap-arrays:0 /check:stack, nothing changed.

Added /debug:all, nothing changed, same limited hint above.

Added /F50000000, nothing changed.

Any other pointers?

 

0 Kudos
14 Replies
mecej4
Black Belt
573 Views

The traceback shows line numbers only for one routine.

Is your program spread over several source files, and all they all compiled with the debug options that you listed? Were any libraries devoid of traceback information linked to produce the EXE? 

It would help to view at least the first 17 lines of program.F.

 

l4t3nc1
Beginner
386 Views

Verwenden Sie auf beiden dieselbe FORTRAN-Version?

Die openAPI UP und läuft.

Haben Sie versucht, Fiddler auszuführen, um zu überprüfen, ob Sie die openAPI manuell testen können?

JohnNichols
Valued Contributor II
542 Views

Search on Stack overflow in this forum, you will not likely be the first to have this problem and I seem to remember there are a lot of answers.  

 

Germán
New Contributor I
525 Views

mecej4:

Program consists of over 170 individual source files, a couple of dozens first put into a handful of libraries, the rest are just part of what becomes the "main" executable. They are all built with the same compiler options.

Line 17 is the very very very first line, the one that says "program <program>" in file program.F; the previous 16 lines are just comments and descriptions, etc.

That's just it...I can't get traceback to tell me more.

 

l4t3nc1:

As mentioned above, fortran versions are not the same; on Linux:2019, on Windows oneAPI 2021.3

I don't know what fiddler is...some kind of debugger, I presume.

 

JohnNichols:

I did look and that is where I learned about additional compiler options like heap-arrays

 


...I guess I should probably learn how to use a debugger

 

mecej4
Black Belt
511 Views

OneAPI provides two different Fortran compilers: the "classic" Ifort and the new Ifx. I hope that you are using only Ifort for all your compilations, since Ifx is not quite suited for general production use yet.

I cannot think of any easy ways of diagnosing the problem. Here are a couple of suggestions to try.

Use /Od /traceback as the only compiler options and rebuild the libraries and the EXE. Does running the EXE still produce stack overflow?

If yes, place a STOP statement as the first executable statement in the main program. Try again. If the stack overflow is still present, examine the program listing and the linker map to find out which local variables cause the problem. You can also remove the rest of the executable statements in the main program and all the subroutines that are no longer needed (any CALL that comes after the STOP, in logical order, is not needed). After a few iterations of this, you may have a "bug reproducer" that you are able to provide for Intel to examine and act upon.

I wonder if l4t3nc1 wrote "openAPI" in place of "oneAPI".

Steve_Lionel
Black Belt Retired Employee
487 Views

If line 17 is PROGRAM, then this does tell you something useful. Did you start out with that /F or did you add it later? Very large values of stack reserve can trigger other problems. On stack overflow, the first traceback line will be in an error reporting routine - it isn't important.

One thing I would try is to boot Windows into Safe Mode and try running the program - does it still fail?

How many of those compile options can you remove before the stack error goes away? (You can keep -fpp I suppose.)

Germán
New Contributor I
462 Views

I don't know what Fiddler/web-traffic has to do with my Fortran program; or, if it is a web-based debugger, sorry, but I am not about to upload company sources.  Oh, just googled, I guess you mean Fiddle ? It looks like an online debugger, maybe.

 

Anyway, thanks for the other pointers, I will try and report later.

JohnNichols
Valued Contributor II
431 Views

As these posts seem to show, you have a hard road to hoe.   These are never easy, I think the main ideas from the real experts,  and I am not one, is to drag out all that you do not need and slowly build the program. 

I did that today with an old F90 code from 91, it had a lot of quirks and would not compile.  Pull it down to a few lines and work slowly outwards. It has taken all day, and it was spread across about 8 files, but it is now working.  Some times you can only add one line or one function. 

When they ask for a reproducer, they do not want your million lines of code, they want say 10 lines that show the problem.  

 

 

CRquantum
New Contributor I
374 Views

Use /traceback to locate the problem. 

Probably you may use -heap-arrays to put array on the heap, that usually solves stackoverflow problem.

CRquantum_0-1632196228292.png

 

jimdempseyatthecove
Black Belt
341 Views

While I note that your program is built as x64 on Windows.....

You should be aware that the static data area (variables with and/or without initialization) CANNOT exceed 2GB.

This is a limitation of the linker object file format. To confuse you even more, while most of the time the linker will report this situation as an error, sometimes you get no warning at all. And this then results in befuddling errors prior to program startup. This appears to be symptomatic of your situation.

The correction for this (in Fortran) is to make the (very) large unitialized arrays ALLOCATABLE then allocate at start of program (e.g. in subroutine possibly named InitArrays). For (very) large initialized arrays (DATA and/or =[...]) you may need to pull this in from a file.

"very large" == 100's of MB.

Jim Dempsey

CRquantum
New Contributor I
321 Views

Thank you very much. 

Please correct me if I was wrong below. 

I usually in the linker, enable largeaddressware, and I have run program with arrays much bigger than 2GB and it seems work fine, see below, 

 

CRquantum_0-1632237268923.png

 

The makefile using Intel Fortran is below. Note that in the linker option, I enabled largeaddressware.

 

EXEC = test.exe
FC = ifort.exe
LINKER = /link

IDIR =
FFLAGS=/nologo /MP /O3 /QxHost /assume:buffered_io /heap-arrays0 /Qipo /libs:static /threads /Qmkl:cluster
F77FLAGS=$(FFLAGS) -fdefault-real-8 -fdefault-double-8 # gfortran only.
LDFLAGS=/INCREMENTAL:NO /LARGEADDRESSAWARE
LIBS =

.SUFFIXES:
.SUFFIXES: .obj .f .f90

.f90.obj:
$(FC) $(FFLAGS) /c $<

%.obj: %.mod

OBJECTS=\
EM_mix.obj\
ran.obj\
samplers.obj

EM_mix: $(OBJECTS)
$(FC) /exe:$(EXEC) $(OBJECTS) $(LIBS) $(LINKER) /out:$(EXEC) $(LDFLAGS)

clean:
@del /q /f $(EXEC) *.mod *.obj *~ > nul 2> nul
# not that in windows rm -f does not work, so use del instead.
# > nul 2> nul just to suppress some redundunt mesage.


EM_mix.obj: ran.obj samplers.obj EM_mix.f90
$(FC) $(FFLAGS) /c EM_mix.f90


ran.obj: ran.f90
$(FC) $(FFLAGS) /c ran.f90

samplers.obj: ran.obj samplers.f90
$(FC) $(FFLAGS) /c samplers.f90

 

 

Steve_Lionel
Black Belt Retired Employee
292 Views

LargeAddressAware is something that was 32-bit only and, even then, not very useful.

Germán
New Contributor I
277 Views


Well, my program seems to have originated back in 1996 and it is a combination of fix and free format files, commmon blocks and modules...a bit of a mutt.

Because the stack overflow message was not very telling even when using debug options, I ended up doing as mentioned above: remove everything, bring a few lines at a time.

Aside from the more than 150 individual files, thankfully, the main program was only 1200 lines; so, after bringing a few hundred lines at a time, the stack overflow message went from line 1 in MAIN to line 1 in SOME_SUBROUTINE...where I quickly noticed some strings where set to 240,000 characters long, even though they were meant to store data like user, date, time, hostname, command line and optinos. Who knows what prompted somebody to use such lengths. Anyway, reducing those numbers to some sensible ones solved my problem and I don't even need to use "heap-array" option.

 

Thanks, everybody, for all the hints and comments.

jimdempseyatthecove
Black Belt
217 Views

Large Address Aware provides for a maximum of 3GB (user program) address space on 32-bit system (remainder left for O/S within user VM). There still was a Linker limit of 2GB for any "segment" (.text, .data, .bss, ...).

Make your large arrays allocatable.

Jim Dempsey

Reply