- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, working with a relatively complex software system, the offending code pointed out from err file traceback is this:
wrt_int_state%BND_VARS_V%VAR_3D(NV)%SOUTH(NA,NB,NC,NT)=BUFF_NTASK(NX)
I will list the things I know and I really appreciate any help on clarifying this. Is this a user code problem or compiler bug?
Compile command line:
mpiifort -g -openmp -mkl=sequential -align array32byte -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -openmp -convert big_endian -assume byterecl -mkl=sequential -O3 -g -traceback -ftz
Compiler version:
intel/14.0.2
impi/5.1.1.109
What I have tried:
I found the SIGSEGV crash will go away if I dump the numbers involved and they actually are all correct. I find it strange why touching the memory will fix the problem.
print *, NVARS_BC_3D_V, NV, NT, NC, NB, &
wrt_int_state%LOCAL_ISTART(NTASK),wrt_int_state%LOCAL_IEND(NTASK), NA, &
'wrt_int_state%BND_VARS_V%VAR_3D(NV)%SOUTH', size(wrt_int_state%BND_VARS_V%VAR_3D(NV)%SOUTH)
wrt_int_state%BND_VARS_V%VAR_3D(NV)%SOUTH(NA,NB,NC,NT)=BUFF_NTASK(NX)
The exact same code when built on a different system using also intel compilers does not crash at all.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This article may have something that may help, Determining Root Cause of Segmentation Faults SIGSEGV or SIGBUS errors
Instead of rebuilding the app on the other system where it runs successfully, are you able to run the same executable on the other system that you built that fails? If that same executable runs successfully on the other system then start looking at system differences like shell stack limits as discussed in Cause #2 of the above article.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
IVF 14.0.2 is relatively old.
Often when an optimized program (-O3 in this case) fails, but then works when you insert a trace PRINT as you have done, then this indicates that there may be an optimization problem. A newer version of the compiler may fix this issue.
As a work around, you might try this:
Leave the diagnostic print in place, however, encapsulate it within an IF(expression) THEN ... ENDIF
*** where the expression is light weight, always .false., and cannot be determined at compile time.
Perhaps IF(NTASK == -9999) THEN ...
The above requires NTASK to be a variable determined only at run time. If this is not the case, then use a different expression that is known to be always false (at run time).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Davis, Kevin D wrote:
This article may have something that may help, Determining Root Cause of Segmentation Faults SIGSEGV or SIGBUS errors
Instead of rebuilding the app on the other system where it runs successfully, are you able to run the same executable on the other system that you built that fails? If that same executable runs successfully on the other system then start looking at system differences like shell stack limits as discussed in Cause #2 of the above article.
I have to rebuild the executible, the two systems are quite different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:
IVF 14.0.2 is relatively old.
Often when an optimized program (-O3 in this case) fails, but then works when you insert a trace PRINT as you have done, then this indicates that there may be an optimization problem. A newer version of the compiler may fix this issue.
As a work around, you might try this:
Leave the diagnostic print in place, however, encapsulate it within an IF(expression) THEN ... ENDIF
*** where the expression is light weight, always .false., and cannot be determined at compile time.
Perhaps IF(NTASK == -9999) THEN ...
The above requires NTASK to be a variable determined only at run time. If this is not the case, then use a different expression that is known to be always false (at run time).
Jim Dempsey
Jim, thanks, I tried your suggestion, it moved the crash to another line later in the code. At this point, I believe it's just a compiler issue with -O3 optimization. Is there a intel version I should use to avoid this. On the problem platform, I have the following intel versions installed:
intel/15.1.133
intel/15.3.187
intel/15.6.233
intel/16.1.150
intel/12-12.0.4.191
intel/13.1.3
intel/14.0.2
intel/15.0.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Without knowing what the underlying issue is there's no way to suggest with certainty that a particular version will avoid it. You have newer versions than your 14.0.2 so try the most recent of those listed, 16.1.150.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page