- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My question was marked as spam before (hopefully incorrectly, so I try gain)
I am rather new to using Fortran, and have been rather stumped with a segfault. Compiles with gfortran , even with optimisations. Also compiles with ifort without optimisations, or without openmp. However multithreaded ifort with optimisations fail.
I have already tried increasing OMP_STACKSIZE, compiled with -warn all and -check all flags,
payed special attention to interfaces and check arg_temp_created, and fiddled around with the heap-arrays flag, to no avail.
Finally, i have shred down many modules and submodules into few lines in a single source file which serves as a minimum working example (MWE) . I have attached the file. I am losing my mind over this, would really appreciate any help.
I apologise if this is not the correct forum
compiler version
*************************
ifort --version returns: ifort (IFORT) 2021.10.0 20230609
mwe description (main.f90)
********************
Basically, the mwe has an array from the main program passed to an intermediate subroutine subrt_interm, which passes the array along to another function which returns a complex number.
Fiddling around with some harmless loops and the solitary write(*,*) statement makes the segfault disappear sometimes, even though the valgrind warnings of 'invalid read' and 'invalid access' remain. It seems that changing assumed shape arrays to ones with explicit dimensions solves the problem.
compile flags
*******************************
ifort -o ALF.out -std08 -cpp -O3 -fp-model fast=2 -xHost -unroll -finline-functions -ipo -ip -heap-arrays 1024 -no-wrap-margin -g -parallel -qopenmp main.F90
output
*********************
on running the code, this is what I get
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libc.so.6 00007FF32561A520 Unknown Unknown Unknown
ALF.out 0000000000404CF7 Unknown Unknown Unknown
ALF.out 00000000004046BD Unknown Unknown Unknown
ALF.out 000000000040428D Unknown Unknown Unknown
libc.so.6 00007FF325601D90 Unknown Unknown Unknown
libc.so.6 00007FF325601E40 __libc_start_main Unknown Unknown
ALF.out 00000000004041A5 Unknown Unknown Unknown
valgrind output
**********************
on running through valgrind I get this:
==527527== Use of uninitialised value of size 8
==527527== at 0x404CB9: main_IP_subrt_interm_ (main.F90:76)
==527527== by 0x4046BC: MAIN__ (main.F90:27)
==527527== by 0x40428C: main (in /home/sounak/mwe/Kondo_Impurities/Prog/ALF.out)
==527527==
==527527== Use of uninitialised value of size 8
==527527== at 0x404CF7: main_IP_subrt_interm_ (main.F90:76)
==527527== by 0x4046BC: MAIN__ (main.F90:27)
==527527== by 0x40428C: main (in /home/sounak/mwe/Kondo_Impurities/Prog/ALF.out)
==527527==
==527527== Invalid read of size 16
==527527== at 0x404CF7: main_IP_subrt_interm_ (main.F90:76)
==527527== by 0x4046BC: MAIN__ (main.F90:27)
==527527== by 0x40428C: main (in /home/sounak/mwe/Kondo_Impurities/Prog/ALF.out)
==527527== Address 0x800f44b360 is not stack'd, malloc'd or (recently) free'd
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What output do you expect from your reproducer?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) what happens if in calc_qty, you change all references to grc_pass1 to grc_passX?
IOW explicitly avoid the dummy name collision between the function and the caller.
Thinking there may be a name space bug in the compiler.
2) What happens if you move the contained procedures from PROGRAM to a contained procedure of a module?
IOW perhapse there is an issue of one contained procedure (in PROGRAM) calling a different contained procedure (in PROGRAM) of which the interface was not known. i.e. the difference between "supposed to work that way" and "what the compiler produced".
Moving code to module will certainly make calc_qty interface visible to subrt_interm.
These tests might expose what is (not) happening.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jimdempseyatthecove has excellent ideas!
I simplified the compiler options. Why are you compiling with -qopenmp? I don't see any OpenMP directives.
+ ifort sounak_mwe_main.F90
+ a.out
(0.000000000000000E+000,0.000000000000000E+000)
If you must compile with -qopenmp, use ifx.
+ ifx -qopenmp sounak_mwe_main.F90
+ a.out
(0.000000000000000E+000,0.000000000000000E+000)
ifx uses a different code generator than ifort.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Barbara_P_Intel the posted code was a simple reproducer that fails with the -qopenmp option. It does not represent the complete program (which presumably uses OpenMP), but which with this option, the symptoms appear.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello @Barbara_P_Intel and @jimdempseyatthecove .
Thank you for taking a look at this.
@Barbara_P_Intel : To answer your first question, yes I do expect the reproducible to output zero. Ther reason i had all these compiler options, including the qopenmp, is because I started from a large-ish project which has many source files and uses libraries like mkl. I had slowly reduced bits of the whole thing into this reproducible-- and the larger thing does indeed need openmp. With the -qopenmp, it is always making invalid accesses (as visible in valgrind/gdb) . A small change like removing the write(*,*) statement makes the executable run, but there still are memory errors which show up in valgrind.
@jimdempseyatthecove Hi Jim. I tried your suggestions (file attached with the cal_qty function in a module, and namespace collision avoided--- in case you wish to take a look). It doesn't seem to help. Indeed, my original code (which I reduced to this reproducible) does have functions in modules.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using 2023.1 compilers on Windows
(and inserting PAUSE at end of PROGRAM to hold screen output from MS VS)
Program works.
@SounakBiswas >>With the -qopenmp, it is always making invalid accesses (as visible in valgrind/gdb)
Does valgrind show invalid accesses when compiling a "Hello World" sample program using the same compiler options?
IOW is the valgrind report related to the initialization of the different runtime system environments (as opposed to the compiler generated code of your program)?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
@jimdempseyatthecove curious. Just checking, does this new version (with namespace collision avoided, and module used) work, or the does the old version work for you as well.
valgrind doesn't throw up anything for hello worlds.
One piece of information which might help is the fact that using explicit dimensions instead of assumed shapes for the input arrays in calc_qty makes all warnings (valgrind) and runtime execution issues disappear across four different systems I have access to.
Thanks
Sounak
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Barbara_P_Intel and @jimdempseyatthecove
I think my issues had been posted multiple times in this forum; perhaps due to the fact that it was marked spam for a few times, and then later allowed in
In another iteration (of the same post), @Igor_V_Intel had looked at it. He could reproduce it, and pin it down to a 'hoist dope vector field' optimisation which is activated for O2 and upwards.
Here is the link to the other post
https://community.intel.com/t5/Intel-Fortran-Compiler/ifort-error-for-assumed-shape-arrays-possible-compiler-error/m-p/1513062
I am sorry for this confusion
Sounak
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>explicit dimensions...
Hmm, in using explicit dimensions the compiler generates synthesizes the array descriptors on the stack using the sizes provided. In your case it would be the integer sizes of the ranks.
In using assumed shape, (:,:,:), the array descriptor is probably generated in a different manner, to provide for differing bounds, and optionally stride other than one.
IOW in the explicit case, the sizes are explicitly passed as arguments .OR. are literals .OR. parameters. In the assumed shape case, the sizes are implicitly passed as hidden arguments (as opposed to passing a reference to the caller's array descriptor be it temporary or not).
Furthermore, as subrt_interm is a contained procedure, and optimization is enabled, the compiler likely inlined the procedure, and in the process goofed something up.
While this is something Intel can investigate, you might have sufficient assumptions to construct a work around.
For example: place subrt_interm using (:,:,:) into a separate source file (provide an interface) .AND. compile without IPO (both caller and callee). Compile with /O2
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for point out Igor's post. He recommends using ifx and so do I.
Please try ifx to compile your application.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page