Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

application crashing

Kunal_Rao
Novice
1,662 Views
Hi
I have compiled my application using Intel compilers. The application is linked with MSMPI library and HDF5 library. I have used the following command while creating the executable (I am working in the cygwin env.):
$ifort -libs:dll -Qlocation,link,"C:\\PROGRA~2\\MICROS~1.0\\VC\\bin" -o flash3 *.o-link -nodefaultlib:LIBCMT msmpi.lib msmpifec.lib msmpifmc.lib libhdf5.lib
The path to MSMPI library and HDF5 library is mentioned in the LIB env. variable.
The executable is created fine and it runs with 1 process. But when I run with 2 processes, I get this following message:
--------------

Problem signature:

Problem Event Name: APPCRASH

Application Name: flash3.exe

Application Version: 0.0.0.0

Application Timestamp: 4c94f5ff

Fault Module Name: StackHash_8f98

Fault Module Version: 6.0.6002.18005

Fault Module Timestamp: 49e0421d

Exception Code: c0000374

Exception Offset: 00000000000aef37

OS Version: 6.0.6002.2.2.0.272.18

Locale ID: 2057

Additional Information 1: 8f98

Additional Information 2: 3926b45e3f7f9075e413d7f43231ac3c

Additional Information 3: e6c5

Additional Information 4: 0b6cf2f93119e73fa1670cd3360652c4

-----------------------

sometimes it also reports:

-----------------------

Problem signature:

Problem Event Name: APPCRASH

Application Name: flash3.exe

Application Version: 0.0.0.0

Application Timestamp: 4c91b3c9

Fault Module Name: ntdll.dll

Fault Module Version: 6.0.6002.18005

Fault Module Timestamp: 49e0421d

Exception Code: c000007b

Exception Offset: 00000000000b8fb8

OS Version: 6.0.6002.2.2.0.272.18

Locale ID: 2057

Additional Information 1: fa3e

Additional Information 2: ac0507478d1c5bd693cfc4fe3987e900

Additional Information 3: fa3e

Additional Information 4: ac0507478d1c5bd693cfc4fe3987e900

--------------------

and then the job aborts with this message:

---------------------

job aborted:

[ranks] message

[0] terminated

[1] process exited without calling finalize

---- error analysis -----

[1] on Head

./flash3 ended prematurely and may have crashed. exit code 0xc0000374

---- error analysis -----

---------------------

I am working on Windows Server 2008, HPC Edition. I guess there is some issue with the Windows dll's that it is loading.

Any idea what could be going wrong ?

Thanks & Regards,

Kunal

0 Kudos
11 Replies
Kunal_Rao
Novice
1,662 Views

I used application verifier to check what is going on. I selected my executable and ran it with 2 processes. This is what I got in the log for process 1:

------------------------

-

-

-

First chance access violation for current stack trace.

849e0f0 - Invalid address causing the exception.

74cae3f0 - Code address executing the invalid access.

12f2e0 - Exception record.

12ee10 - Context record.

-

vrfcore!VerifierDisableVerifier+934 ( @ 0)

ntdll!RtlApplicationVerifierStop+d3 ( @ 0)

vfbasics!+7fef0f26377 ( @ 0)

vfbasics!+7fef0f27c9b ( @ 0)

vfbasics!+7fef0f27392 ( @ 0)

ntdll!RtlIpv4AddressToStringA+1cb ( @ 0)

ntdll!_C_specific_handler+27d ( @ 0)

ntdll!KiUserExceptionDispatcher+2e ( @ 0)

MSVCR80!memcpy+250 ( @ 0)

------------------------------------

The job this time aborted with this message:

------------------------------------

forrtl: severe (159): Program Exception - breakpoint

Image PC Routine Line Source

ntdll.dll 0000000076E76060 Unknown Unknown Unknown

vrfcore.dll 000007FEF0FC37EE Unknown Unknown Unknown

vrfcore.dll 000007FEF0FC9970 Unknown Unknown Unknown

ntdll.dll 0000000076EEC193 Unknown Unknown Unknown

vfbasics.dll 000007FEF0F26377 Unknown Unknown Unknown

vfbasics.dll 000007FEF0F27C9B Unknown Unknown Unknown

vfbasics.dll 000007FEF0F27392 Unknown Unknown Unknown

ntdll.dll 0000000076E5396B Unknown Unknown Unknown

ntdll.dll 0000000076E69795 Unknown Unknown Unknown

ntdll.dll 0000000076E76C78 Unknown Unknown Unknown

MSVCR80.dll 0000000074CAE3F0 Unknown Unknown Unknown

msmpi.dll 0000000068D758DD Unknown Unknown Unknown

msmpi.dll 0000000068D724A5 Unknown Unknown Unknown

msmpi.dll 0000000068D6F21B Unknown Unknown Unknown

msmpi.dll 0000000068D66DD8 Unknown Unknown Unknown

msmpi.dll 0000000068D1757D Unknown Unknown Unknown

msmpi.dll 0000000068D0A9F4 Unknown Unknown Unknown

msmpi.dll 0000000068D0B0F5 Unknown Unknown Unknown

flash3.exe 0000000140162666 Unknown Unknown Unknown

flash3.exe 000000014013E814 Unknown Unknown Unknown

flash3.exe 0000000140032553 Unknown Unknown Unknown

flash3.exe 0000000140032CF2 Unknown Unknown Unknown

flash3.exe 0000000140004E6F Unknown Unknown Unknown

flash3.exe 000000014000CFA1 Unknown Unknown Unknown

flash3.exe 00000001401FD08C Unknown Unknown Unknown

flash3.exe 00000001401F874A Unknown Unknown Unknown

kernel32.dll 0000000076C4BE3D Unknown Unknown Unknown

ntdll.dll 0000000076E56A51 Unknown Unknown Unknown

job aborted:

[ranks] message

[0] process exited without calling finalize

[1] terminated

---- error analysis -----

[0] on WIN-MN7DR40J561

./flash3 ended prematurely and may have crashed. exit code 159

---- error analysis -----

---------------------------------------

Can we get any hints to resolve from this ?

Thanks & Regards,

Kunal

0 Kudos
IanH
Honored Contributor II
1,662 Views
Chances are that you have a bug in your code. It is well nigh impossible for those who don't have access to your code to debug it ... so ... you need to do some debugging! If you find an area of code that you think might be suspect you can post it here and then you might get some assistance. Other things you might try include:

  • running your code under the debugger
  • adding options such as /traceback, /check:all and /warn:all to enable run time checks
  • progressively cutting your code down to the smallest piece that exhibits the problem.
Good luck. Please remember that the telepathic readers of this forum are currently on vacation.
0 Kudos
Wendy_Doerner__Intel
Valued Contributor I
1,662 Views
If you compile with the /check switch it will add runtime checks that might catch the coding error.

------

Wendy

Attaching or including files in a post

0 Kudos
Kunal_Rao
Novice
1,662 Views

Thanks for your suggestions. I added the /check:all /warn:all /traceback options while compiling.

And now it crashed with some details about the error. The error message before crashing is as follows:

------------

forrtl: severe (408): fort: (12): Variable STRLOWER has substring ending point 2 which is greater than the variable length of 1

Image PC Routine Line Source

libifcorert.dll 00000000100D0758 Unknown Unknown Unknown

libifcorert.dll 00000000100C9CE9 Unknown Unknown Unknown

libifcorert.dll 00000000100B5DD3 Unknown Unknown Unknown

libifcorert.dll 000000001002725F Unknown Unknown Unknown

libifcorert.dll 0000000010027606 Unknown Unknown Unknown

flash3.exe 00000001403C437F pc_checkcgsmks_ 75 pc_utilities.F90

flash3.exe 00000001400734CB physicalconstants 73 PhysicalConstants_init.F90

flash3.exe 0000000140005D93 driver_initflash_ 111 Driver_initFlash.F90

flash3.exe 0000000140018C84 MAIN__ 38 Flash.F90

flash3.exe 00000001404B979C Unknown Unknown Unknown

flash3.exe 00000001404B4E6A Unknown Unknown Unknown

kernel32.dll 000000007739BE3D Unknown Unknown Unknown

ntdll.dll 00000000774D6A51 Unknown Unknown Unknown

job aborted:

[ranks] message

[0] process exited without calling finalize

[1] terminated

---- error analysis -----

[0] on WIN-MN7DR40J561

./flash3 ended prematurely and may have crashed. exit code 408

---- error analysis -----


Is there really a problem with the variable StrLower ? This code has worked with PGI compilers so I am not sure if there is a bug in the code.
The related portion of the code is as follows:
-------------------------------------------------
In Physical_Constants_init.F90 file:

character(len=MAX_STRING_LENGTH) :: cgsORmks, errorstring

-----------------------

In pc_utilities.F90 file:

SUBROUTINE pc_checkCGSMKS(cgsORmks,isError)

character(len=3) :: cgsORmksLower

call pc_makeLowercase(cgsORmks,cgsORmksLower)

------------------------

the pc_makeLowercase subroutine in pc_utilities.F90 file:

SUBROUTINE pc_makeLowercase (str, strLower)

implicit none

character(len=*), intent(in) :: str

character(len=len(str)), intent(out) :: strLower

integer :: i

strLower = str

do i = 1, len_trim(str)

if (lge(str(i:i), 'A') .and. lle(str(i:i), 'Z'))

strLower(i:i) = achar( iachar(str(i:i)) + 32 )

enddo

return

END SUBROUTINE pc_makeLowercase

---------------------------------------------------------------------------------------------

Any help in this regard would be really very helpful.
Thanks & Regards,
Kunal
0 Kudos
TimP
Honored Contributor III
1,662 Views
If I compensate for a probable mis-quotation in your posted source code, this may indicate a compiler/library bug, assuming this actually is the place where checking caused the stop. PGI may not have any checking available there. You would need to run e.g. under debugger or add diagnostics to source and see if it is true that it is attempting to modify strLower(2:2) when len(strLower) == 1.
It's also possible this is not the place which caused the original crash.
0 Kudos
IanH
Honored Contributor II
1,662 Views
What value is MAX_STRING_LENGTH?

What are the declarations for the dummy arguments for pc_checkCGSMKS?

Are these external sub-programs or module procedures?
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,662 Views
The length of the output variable character array (strLower)should be passed in (len=*) or other means.
What do you expect to happen when the len(str) exceeds the size of the buffer for strLower?
strLower = str

will clobber memory.

Jim Dempsey
0 Kudos
mecej4
Honored Contributor III
1,662 Views
Others have pointed out your usage of actual string arguments that are shorter than their corresponding arguments, and have noted instances of incorrect implicit typing of variables. Note, in addition, that using both -warn all and -check all triggers the introduction of a run-time bug by the current Intel compilers, as I noted in a separate thread.
0 Kudos
Kunal_Rao
Novice
1,662 Views
yes.. I was using both -warn:all and -check:all and was getting that error. Now I don't get that when I removed one of them.

But the application still crashes with the initially mentioned error (the first post in the thread) ..

Thanks & Regards,
Kunal
0 Kudos
mecej4
Honored Contributor III
1,662 Views
You are developing/modifying a large application with mixed authorship. As exemplified in this thread, it is likely that there are several bugs in the code, which may have remained hidden in the past.

In your situation, I would develop a plan based on "defensive programming".

I would use more than one compiler, work up test cases with known solutions, and set up verification tests to make sure that consistent (though not necessarily identical) results were obtained with different compilers and different compiler options. I would not rely on any single compiler/library to do these things for me automatically. I would not care much about optimization at this stage, because it does not make sense to optimize buggy code.

I have found it helpful to think of each combination of compiler options as a separate version of the compiler. The default "version" of the compiler, being the most commonly used, is probably the most bug free. As I then use other options with specific purposes in mind, I would keep in mind that these "modified versions", while having more powerful capabilities than the default version, are more likely to contain bugs.

Coexisting with bugs by developing defences and work-arounds is a necessary part of the art.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,662 Views
Kunal,

Selecting compiler options to get it to keep quiet about programming errors does not fix the programming errors.

It is not unusual for old code with programming errors to run without crashing, or for that matter without producing error in output. Meaning the program worked not by design but by fortuitous accident. Generally it is asituation where memory that got trashed by errant code wasn't used after the trashing. Changing anything in the code or changing compiler options or compiler vendors or versions would mysteriously "break the code". Where in reality the code was always broken but the effects were not observed until the change.

You need to fix this code. It is the responsible thing to do.

If you elect to not fix the code, but choose your options (or compilers)to compile such that the program does not crash, then theresults of your program may very well be invalid. Bad results can have large financial consequences.

Jim Dempsey
0 Kudos
Reply