Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28859 Discussions

oneAPI ifort OpenMP crash on print *,"text"

jimdempseyatthecove
Honored Contributor III
1,294 Views

2023 ifort, VS 2019

 

Simple reproducer does not exhibit problem.

The Solution has 18 Projects and several external libraries. IOW it is not suitable to send in as a reproducer.

 

I am not asking that someone diagnose the problem.

 

I would like to know if someone else has seen this problem.

 

My code works in Debug build fine (all runtime checks, no optimizations)

Runs fine with Debug symbols at max speed (no runtime checks). This is in a separate configuration named DebugFast

 

I derive a new configuration called DebugOpenMP from Debug, then set the Language property for OpenMP parallel code in all the projects with configuration DebugOpenMP. IOW every setting, with exception to the OpenMP selection, is the same between Debug and DebugOpenMP configurations.

 

When I build DebugOpenMP  (all runtime checks, no optimizations), and run the program in debugger, the Fortran runtime system crashes at first entry.

To diagnose this problem, I inserted a

    print *, aCharacterVariableHere

This was at the 2nd line of the program after initializing the variable

Note, this is NOT in an OpenMP parallel region. Just at 2nd statement in the program.

 

in tracing into the Fortran runtime system by stepping into the print statement (indentations reflect call level):

call for_write_seq_lis
...
00007FF6212345FE  call        for__open_default (07FF621233790h)

  00007FF621233874  call        qword ptr [__imp_GetEnvironmentVariableA (07FF62159B1A8h)]  
  00007FF621233AE6  call        for__open_proc (07FF6212A4F00h)  
    00007FF6212A4F11  call        __chkstk (07FF621556C70h) 
    00007FF6212A4FFD  call        for__compute_filename (07FF6212AA440h)  crash
      00007FF6212AA451  call        __chkstk (07FF621556C70h) 
        00007FF6212AA4A9  call        for__get_vm (07FF62123CCB0h)  
          00007FF62123CCC0  call        malloc (07FF621596CB3h)
        00007FF6212AA4F9  call        for__get_vm (07FF62123CCB0h)
        00007FF6212AA519  call        for__get_vm (07FF62123CCB0h)
        00007FF6212AA59F  call        qword ptr [__imp_GetEnvironmentVariableA (07FF62159B1A8h)] 
        00007FF6212AA60D  call        qword ptr [__imp_GetStdHandle (07FF62159B190h)] 
        00007FF6212AA797  call        __intel_sse2_strncmp (07FF6213049F0h)
        00007FF6212AA87A  call        __intel_sse2_strcpy (07FF621306900h)
        00007FF6212AA8BF  call        for__free_vm (07FF62123CD90h)  ******* crash **********

it crashes inside for__compute_filename at the first call to for__free_vm

 

As stated earler, the main PROGRAM does not have parallel regions, nor does it have a USE OMP_LIB. However it does include USE with some of my module files.

 

I tried compiling the main PROGRAM file using IFX with the hope that a different Fortran runtime system would load. This has the same issue.

 

I really need to get OpenMP running with this program as it is simulation program that really needs the additional threads. This program has been under continuous development since 2005 using OpenMP and I haven't had an issue OpenMP in it up until oneAPI version 2023.

Jim Dempsey

 

0 Kudos
7 Replies
JohnNichols
Valued Contributor III
1,281 Views

You need a reproducer, the error cannot occur just cause your program is a monster.  

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,254 Views

John, as I said in my post, I don't expect someone to investigate this. I only ask if they have seen this behavior before.

BTW;

I also checked the build logs as well as the dll runtime load list to verify that the same Intel dlls are loaded (between functioning simple reproducer and the symptomatic program).

The functioning simple reproducer was built in a separate Solution.

Adding the simple reproducer code as a new project to the solution with the failing program also fails. ergo, I suspect this has to do with the Solution configuration. I am hoping that someone has seen this behavior before and has an "ah ha" moment as to what they found out.

 

What is particularly odd is that the error occurs on a for_free_vm (aka deallocate or delete or free) of an allocation made by the Fortran runtime system function for__compute_filename. IOW that routine made 3 allocations, ran its code, then just prior to return, its cleanup section failed on the first freeing of the allocated memory.

My suspicion is that something corrupted the pointer to the object that was allocated.

 

A potential place to look at is the __intel_sse2_strcpy may be overwriting something it shouldn't

 

The problem does not lie with the Fortran compiler's generated code. I lies within the Fortran runtime system code.

FWIW 50 years ago when I worked at Digital Equipment Corporation as a software support engineer, I used to get bug reports as nebulous as this where all you had was a proximity of the problem location. This was sufficient enough for me to visually walk through the code to find the problem. The newer generation of software support engineers apparently are either unwilling or unable to do this. They could be too busy or management issued the edict "Thou shalt not work on an issue without a reproducer". Whatever the case is, I am not asking Intel to look into this, as I suspect this is an issue between the VS solution configuration and the Fortran runtime system. True, the Fortran runtime system shouldn't crap out.

Jim Dempsey

0 Kudos
andrew_4619
Honored Contributor III
1,242 Views

Quite a puzzle, I have no magic insight I'm afraid. As a matter of interest do you get the same behaviour if you run the program outside the VS environment?  I guess there is also the very tedious chore of copying all the source files to a new place and building a set of new projects in a new solution, that has worked for me in the past when all else failed but never felt like a satisfactory solution even when it worked!

0 Kudos
JohnNichols
Valued Contributor III
1,238 Views

I understand, but it would be fun to see a simple version, just as a learning exercise, I know I do not know as much as you know, if you know what I mean.  We learn from failure not success.  

 

On a t-stat response the other day, I got a probability of 10 to the -40.   I was trying to explain it to friend, how rare that is, if you take the surface of the universe coat it in a single layer of proton sized objects, you are just close to 10 to the -40 if someone has picked out one proton and you ask someone else to guess the number of the selected proton.  

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,233 Views

I've resolved the issue by recreating the Solution.

 

With the exception of the Startup Project (which was created anew), I was able to add all the old projects as existing projects. This saved a lot of time. I did have to add the build dependencies, Include directories and OpenMP selection on the ...OpenMP projects.

 

Not sure what the deal was.

 

The link phase (prior to getting it successfully linked) complained about a conflict between:

     libifcoremt.lib and libifcoremdd.lib

Because I am building the DebugOpenMP version I selected to ignore libifcoremt.lib

This got it to link and run through a parallel region with a print *,"..."

 

The specific solution that had the issue had a long life through multiple updates of MS Visual Studio and Intel Software.

Either I linked the wrong ifcore....lib or the Solution database was the root cause of the problem.

 

FWIW code compiled with Debug symbols enabled should link and run with either

    libifcoremt.lib or libifcoremdd.lib

 

Jim Dempsey

 

JohnNichols
Valued Contributor III
1,224 Views
0 Kudos
andrew_4619
Honored Contributor III
1,213 Views

At least you managed to find a cause when making the new solution!

 

0 Kudos
Reply