This problem exists only if

kirkwatrous · ‎01-27-2016

As shown below, compiling this trivial program with ifort 16.0.1 results in a binary that dumps core if its stdout is redirected to a file. Compiling the same code with ifort 15.0 results in no such problem. watrok@amrndhl765:$ cat ok.f90 PRINT *,'ok' END watrok@amrndhl765:$ module purge watrok@amrndhl765:$ module load intel/16.0.1 watrok@amrndhl765:$ ifort -v ifort version 16.0.1 watrok@amrndhl765:$ ifort -o ok-16.0.1 ok.f90 watrok@amrndhl765:$ ls ok-16.0.1 ok.f90 watrok@amrndhl765:$ ./ok-16.0.1 ok watrok@amrndhl765:$ ./ok-16.0.1 > ok.out *** buffer overflow detected ***: ./ok-16.0.1 terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x3808902507] /lib64/libc.so.6[0x38089003f0] /lib64/libc.so.6[0x38088ff849] /lib64/libc.so.6(_IO_default_xsputn+0xc9)[0x38088748b9] /lib64/libc.so.6(_IO_vfprintf+0x11d8)[0x3808845428] /lib64/libc.so.6(__vsprintf_chk+0x9d)[0x38088ff8ed] /lib64/libc.so.6(__sprintf_chk+0x7f)[0x38088ff82f] ./ok-16.0.1[0x445add] ./ok-16.0.1[0x4473e9] ./ok-16.0.1[0x4340ad] ./ok-16.0.1[0x4097fe] ./ok-16.0.1[0x402e9f] ./ok-16.0.1[0x402e1e] /lib64/libc.so.6(__libc_start_main+0xfd)[0x380881ecdd] ./ok-16.0.1[0x402d29] ======= Memory map: ======== 00400000-004b1000 r-xp 00000000 00:38 5515555723 /hpc/grid/scratchtest/watrok/f90/ok-16.0.1 006b0000-006b4000 rw-p 000b0000 00:38 5515555723 /hpc/grid/scratchtest/watrok/f90/ok-16.0.1 006b4000-006d2000 rw-p 00000000 00:00 0 01675000-01696000 rw-p 00000000 00:00 0 [heap] 3567c00000-3567c16000 r-xp 00000000 fd:00 529492 /lib64/libgcc_s-4.4.7-20120601.so.1 3567c16000-3567e15000 ---p 00016000 fd:00 529492 /lib64/libgcc_s-4.4.7-20120601.so.1 3567e15000-3567e16000 rw-p 00015000 fd:00 529492 /lib64/libgcc_s-4.4.7-20120601.so.1 3808000000-3808020000 r-xp 00000000 fd:00 555367 /lib64/ld-2.12.so 380821f000-3808220000 r--p 0001f000 fd:00 555367 /lib64/ld-2.12.so 3808220000-3808221000 rw-p 00020000 fd:00 555367 /lib64/ld-2.12.so 3808221000-3808222000 rw-p 00000000 00:00 0 3808400000-3808402000 r-xp 00000000 fd:00 555377 /lib64/libdl-2.12.so 3808402000-3808602000 ---p 00002000 fd:00 555377 /lib64/libdl-2.12.so 3808602000-3808603000 r--p 00002000 fd:00 555377 /lib64/libdl-2.12.so 3808603000-3808604000 rw-p 00003000 fd:00 555377 /lib64/libdl-2.12.so 3808800000-380898a000 r-xp 00000000 fd:00 555368 /lib64/libc-2.12.so 380898a000-3808b89000 ---p 0018a000 fd:00 555368 /lib64/libc-2.12.so 3808b89000-3808b8d000 r--p 00189000 fd:00 555368 /lib64/libc-2.12.so 3808b8d000-3808b8e000 rw-p 0018d000 fd:00 555368 /lib64/libc-2.12.so 3808b8e000-3808b93000 rw-p 00000000 00:00 0 3808c00000-3808c17000 r-xp 00000000 fd:00 534292 /lib64/libpthread-2.12.so 3808c17000-3808e17000 ---p 00017000 fd:00 534292 /lib64/libpthread-2.12.so 3808e17000-3808e18000 r--p 00017000 fd:00 534292 /lib64/libpthread-2.12.so 3808e18000-3808e19000 rw-p 00018000 fd:00 534292 /lib64/libpthread-2.12.so 3808e19000-3808e1d000 rw-p 00000000 00:00 0 3809000000-3809083000 r-xp 00000000 fd:00 555370 /lib64/libm-2.12.so 3809083000-3809282000 ---p 00083000 fd:00 555370 /lib64/libm-2.12.so 3809282000-3809283000 r--p 00082000 fd:00 555370 /lib64/libm-2.12.so 3809283000-3809284000 rw-p 00083000 fd:00 555370 /lib64/libm-2.12.so 7f7f990fc000-7f7f99101000 rw-p 00000000 00:00 0 7f7f99124000-7f7f99126000 rw-p 00000000 00:00 0 7fff5fc7f000-7fff5fc95000 rw-p 00000000 00:00 0 [stack] 7fff5fcd4000-7fff5fcd5000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] forrtl: error (76): Abort trap signal Image PC Routine Line Source ok-16.0.1 0000000000477435 Unknown Unknown Unknown ok-16.0.1 00000000004751F7 Unknown Unknown Unknown ok-16.0.1 0000000000444AE4 Unknown Unknown Unknown ok-16.0.1 00000000004448F6 Unknown Unknown Unknown ok-16.0.1 00000000004259F6 Unknown Unknown Unknown ok-16.0.1 00000000004037D8 Unknown Unknown Unknown libpthread.so.0 0000003808C0F500 Unknown Unknown Unknown libc.so.6 00000038088328A5 Unknown Unknown Unknown libc.so.6 0000003808834085 Unknown Unknown Unknown libc.so.6 00000038088707B7 Unknown Unknown Unknown libc.so.6 0000003808902507 Unknown Unknown Unknown libc.so.6 00000038089003F0 Unknown Unknown Unknown libc.so.6 00000038088FF849 Unknown Unknown Unknown libc.so.6 00000038088748B9 Unknown Unknown Unknown libc.so.6 0000003808845428 Unknown Unknown Unknown libc.so.6 00000038088FF8ED Unknown Unknown Unknown libc.so.6 00000038088FF82F Unknown Unknown Unknown ok-16.0.1 0000000000445ADD Unknown Unknown Unknown ok-16.0.1 00000000004473E9 Unknown Unknown Unknown ok-16.0.1 00000000004340AD Unknown Unknown Unknown ok-16.0.1 00000000004097FE Unknown Unknown Unknown ok-16.0.1 0000000000402E9F Unknown Unknown Unknown ok-16.0.1 0000000000402E1E Unknown Unknown Unknown libc.so.6 000000380881ECDD Unknown Unknown Unknown ok-16.0.1 0000000000402D29 Unknown Unknown Unknown Aborted (core dumped) watrok@amrndhl765:$ ls ok-16.0.1 ok.f90 ok.out watrok@amrndhl765:$ module purge watrok@amrndhl765:$ module load intel/15.0 watrok@amrndhl765:$ ifort -v ifort version 15.0.0 watrok@amrndhl765:$ ifort -o ok-15.0 ok.f90 watrok@amrndhl765:$ ./ok-15.0 ok watrok@amrndhl765:$ ./ok-15.0 > ok.out watrok@amrndhl765:$ cat ok.out ok watrok@amrndhl765:$ ls -l total 2598 -rwxr-xr-x 1 watrok root 724005 Jan 27 15:53 ok-15.0 -rwxr-xr-x 1 watrok root 773645 Jan 27 15:52 ok-16.0.1 -rw-r--r-- 1 watrok amer 17 Jan 27 15:51 ok.f90 -rw-r--r-- 1 watrok root 4 Jan 27 15:53 ok.out

Kevin_D_Intel · ‎01-28-2016

I cannot reproduce this so there’s something about your environment that I’m not matching.

What Linux OS are you using?

mecej4 · ‎01-28-2016

I suspect that the problem lies with the system C runtime rather than with Intel Fortran, because the error output seems to start from the bowels of /lib64/libc.so.6 . You could try a couple of simple tests to obtain a bit more information. 1) Compile your test program with Gfortran and try redirecting the output of the a.out. 2) Compile using Ifort but use the -traceback and -g options, then try redirection.

kirkwatrous · ‎01-28-2016

Kevin Davis (Intel) wrote:

I cannot reproduce this so there’s something about your environment that I’m not matching.

What Linux OS are you using?

We have systems running RHEL 6.4 and RHEL 6.6, both of which can reproduce this issue. The RHEL 6.4 systems have kernel 2.6.32-358.11.1.el6.x86_64 and glibc-2.12-1.107.el6_4.2.x86_64. The RHEL 6.6 systems have kernel 2.6.32-504.16.2.el6.x86_64 and glibc-2.12-1.149.el6_6.7.x86_64.

kirkwatrous · ‎01-28-2016

mecej4 wrote:

I suspect that the problem lies with the system C runtime rather than with Intel Fortran, because the error output seems to start from the bowels of /lib64/libc.so.6 . You could try a couple of simple tests to obtain a bit more information. 1) Compile your test program with Gfortran and try redirecting the output of the a.out. 2) Compile using Ifort but use the -traceback and -g options, then try redirection.

gfortran works fine, as does Intel 15.0 ifort compiler. With Intel 16.0.1 ifort compiler with -traceback and -g options, here is the stderr:

watrok@amrndhl460:$ ifort -v
ifort version 16.0.1
watrok@amrndhl460:$ ifort -traceback -g ok.f90
watrok@amrndhl460:$ ./a.out
ok
watrok@amrndhl460:$ ./a.out > ok.out
*** buffer overflow detected ***: ./a.out terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x37)[0x3091302527]
/lib64/libc.so.6[0x3091300410]
/lib64/libc.so.6[0x30912ff869]
/lib64/libc.so.6(_IO_default_xsputn+0xc9)[0x3091274639]
/lib64/libc.so.6(_IO_vfprintf+0x11d8)[0x30912451a8]
/lib64/libc.so.6(__vsprintf_chk+0x9d)[0x30912ff90d]
/lib64/libc.so.6(__sprintf_chk+0x7f)[0x30912ff84f]
./a.out[0x445add]
./a.out[0x4473e9]
./a.out[0x4340ad]
./a.out[0x4097fe]
./a.out[0x402e9e]
./a.out[0x402e1e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x309121ed5d]
./a.out[0x402d29]
======= Memory map: ========
00400000-004b1000 r-xp 00000000 fd:02 2883941                            /tmp/watrok/f90/a.out
006b0000-006b4000 rw-p 000b0000 fd:02 2883941                            /tmp/watrok/f90/a.out
006b4000-006d2000 rw-p 00000000 00:00 0
02673000-02694000 rw-p 00000000 00:00 0                                  [heap]
3090a00000-3090a20000 r-xp 00000000 fd:00 526774                         /lib64/ld-2.12.so
3090c1f000-3090c20000 r--p 0001f000 fd:00 526774                         /lib64/ld-2.12.so
3090c20000-3090c21000 rw-p 00020000 fd:00 526774                         /lib64/ld-2.12.so
3090c21000-3090c22000 rw-p 00000000 00:00 0
3090e00000-3090e83000 r-xp 00000000 fd:00 535929                         /lib64/libm-2.12.so
3090e83000-3091082000 ---p 00083000 fd:00 535929                         /lib64/libm-2.12.so
3091082000-3091083000 r--p 00082000 fd:00 535929                         /lib64/libm-2.12.so
3091083000-3091084000 rw-p 00083000 fd:00 535929                         /lib64/libm-2.12.so
3091200000-309138a000 r-xp 00000000 fd:00 528578                         /lib64/libc-2.12.so
309138a000-309158a000 ---p 0018a000 fd:00 528578                         /lib64/libc-2.12.so
309158a000-309158e000 r--p 0018a000 fd:00 528578                         /lib64/libc-2.12.so
309158e000-309158f000 rw-p 0018e000 fd:00 528578                         /lib64/libc-2.12.so
309158f000-3091594000 rw-p 00000000 00:00 0
3091600000-3091617000 r-xp 00000000 fd:00 529004                         /lib64/libpthread-2.12.so
3091617000-3091817000 ---p 00017000 fd:00 529004                         /lib64/libpthread-2.12.so
3091817000-3091818000 r--p 00017000 fd:00 529004                         /lib64/libpthread-2.12.so
3091818000-3091819000 rw-p 00018000 fd:00 529004                         /lib64/libpthread-2.12.so
3091819000-309181d000 rw-p 00000000 00:00 0
3091a00000-3091a02000 r-xp 00000000 fd:00 528120                         /lib64/libdl-2.12.so
3091a02000-3091c02000 ---p 00002000 fd:00 528120                         /lib64/libdl-2.12.so
3091c02000-3091c03000 r--p 00002000 fd:00 528120                         /lib64/libdl-2.12.so
3091c03000-3091c04000 rw-p 00003000 fd:00 528120                         /lib64/libdl-2.12.so
3097600000-3097616000 r-xp 00000000 fd:00 534069                         /lib64/libgcc_s-4.4.7-20120601.so.1
3097616000-3097815000 ---p 00016000 fd:00 534069                         /lib64/libgcc_s-4.4.7-20120601.so.1
3097815000-3097816000 rw-p 00015000 fd:00 534069                         /lib64/libgcc_s-4.4.7-20120601.so.1
7f9d8a058000-7f9d8a05d000 rw-p 00000000 00:00 0
7f9d8a082000-7f9d8a084000 rw-p 00000000 00:00 0
7fff37f87000-7fff37f9d000 rw-p 00000000 00:00 0                          [stack]
7fff37fb9000-7fff37fba000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
forrtl: error (76): Abort trap signal
Image              PC                Routine            Line        Source
a.out              0000000000477435 Unknown               Unknown Unknown
a.out              00000000004751F7 Unknown               Unknown Unknown
a.out              0000000000444AE4 Unknown               Unknown Unknown
a.out              00000000004448F6 Unknown               Unknown Unknown
a.out              00000000004259F6 Unknown               Unknown Unknown
a.out              00000000004037D8 Unknown               Unknown Unknown
libpthread.so.0    000000309160F710 Unknown               Unknown Unknown
libc.so.6          0000003091232625 Unknown               Unknown Unknown
libc.so.6          0000003091233E05 Unknown               Unknown Unknown
libc.so.6          0000003091270537 Unknown               Unknown Unknown
libc.so.6          0000003091302527 Unknown               Unknown Unknown
libc.so.6          0000003091300410 Unknown               Unknown Unknown
libc.so.6          00000030912FF869 Unknown               Unknown Unknown
libc.so.6          0000003091274639 Unknown               Unknown Unknown
libc.so.6          00000030912451A8 Unknown               Unknown Unknown
libc.so.6          00000030912FF90D Unknown               Unknown Unknown
libc.so.6          00000030912FF84F Unknown               Unknown Unknown
a.out              0000000000445ADD Unknown               Unknown Unknown
a.out              00000000004473E9 Unknown               Unknown Unknown
a.out              00000000004340AD Unknown               Unknown Unknown
a.out              00000000004097FE Unknown               Unknown Unknown
a.out              0000000000402E9E MAIN__                      1 ok.f90
a.out              0000000000402E1E Unknown               Unknown Unknown
libc.so.6          000000309121ED5D Unknown               Unknown Unknown
a.out              0000000000402D29 Unknown               Unknown Unknown
Aborted (core dumped)

kirkwatrous · ‎01-28-2016

I've tried to answer mecej4's question twice, but I keep getting "Your comment has been queued for review by site administrators and will be published after approval." Why?

Steven_L_Intel1 · ‎01-28-2016

This forum uses an automated system for spam detection, and it sometimes gets thrown off by code or diagnostics just as part of text. Messages are reviewed promptly and dealt with.

Kevin_D_Intel · ‎01-28-2016

Thank you. I can match mostly to your RHEL 6.4/glibc versions but I cannot reproduce the issue. Yours is a slightly different kernel variant but don't suspect that. I have 2.6.32-358.el6.x86_64 / glibc-2.12-1.107.el6.x86_64.

This sort of error usually proves hard to identify. I have seen similar failures with a variety of causes. One instance was mixing older shared libs on newer distros or other distros. Another related using LD_PRELOAD. Another older case was mixing the g++ and C++ libstdc++ library and the ifort C++ library libcxa.

You might look at your environment and various settings like LD_LIBRARY_PATH and the like since you are loading a different module configuration. Check if LD_PRELOAD is in play. Maybe looking at ldd and/or ldconfig would shed clues.

Running in the debugger might shed more clues. You can set this up for running under the debugger as shown below. I do not know if stepping or setting other breakpoints within libc call stack you showed and running would shed more clues or not.

$ gdb a.out
Reading symbols from /tmp/u607517/a.out...done.
(gdb) set args "> out.txt"
(gdb) br __sprintf_chk
Breakpoint 2 at 0x3f752ff7b0
(gdb) r
Starting program: /tmp/u607517/a.out "> out.txt"
[Thread debugging using libthread_db enabled]

Breakpoint 2, 0x0000003f752ff7b0 in __sprintf_chk () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003f752ff7b0 in __sprintf_chk () from /lib64/libc.so.6
#1  0x0000000000406de6 in for__preconnected_units_create ()
#2  0x0000000000405983 in for_rtl_init_ ()
#3  0x0000000000402e19 in main ()
#4  0x0000003f7521ecdd in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000402d29 in _start ()
(gdb)

kirkwatrous · ‎01-28-2016

Kevin Davis (Intel) wrote:

You might look at your environment and various settings like LD_LIBRARY_PATH and the like since you are loading a different module configuration. Check if LD_PRELOAD is in play. Maybe looking at ldd and/or ldconfig would shed clues.

Running in the debugger might shed more clues. You can set this up for running under the debugger as shown below. I do not know if stepping or setting other breakpoints within libc call stack you showed and running would shed more clues or not.

Neither LD_LIBRARY_PATH and LD_PRELOAD are set in my shell environment. Here is what it looks like on a RHEL 6.4 system:

$ ldd a.out
        linux-vdso.so.1 =>  (0x00007fffa6dab000)
        libm.so.6 => /lib64/libm.so.6 (0x0000003809000000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003808c00000)
        libdl.so.2 => /lib64/libdl.so.2 (0x0000003808400000)
        libc.so.6 => /lib64/libc.so.6 (0x0000003808800000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003567c00000)
        /lib64/ld-linux-x86-64.so.2 (0x0000003808000000)

$ gdb a.out
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /tmp/watrok/f90/a.out...done.
(gdb) set args "> out.txt"
(gdb) br __sprintf_chk
Breakpoint 1 at 0x402c08
(gdb) r
Starting program: /tmp/watrok/f90/a.out "> out.txt"
[Thread debugging using libthread_db enabled]

Breakpoint 1, ___sprintf_chk (s=0x7fffffffd8d0 "", flags=1, slen=32, format=0x494404 "FORT%d") at sprintf_chk.c:28
28      {
(gdb) bt
#0  ___sprintf_chk (s=0x7fffffffd8d0 "", flags=1, slen=32, format=0x494404 "FORT%d") at sprintf_chk.c:28
#1  0x0000000000406de6 in for__preconnected_units_create ()
#2  0x0000000000405983 in for_rtl_init_ ()
#3  0x0000000000402e19 in main ()
#4  0x000000380881ecdd in __libc_start_main (main=0x402df0 <main>, argc=2, ubp_av=0x7fffffffdc38, init=<value optimized out>,
    fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fffffffdc28) at libc-start.c:226
#5  0x0000000000402d29 in _start ()
(gdb) f 0
#0  ___sprintf_chk (s=0x7fffffffd8d0 "", flags=1, slen=32, format=0x494404 "FORT%d") at sprintf_chk.c:28
28      {
(gdb) p s
$1 = 0x7fffffffd8d0 ""

kgore4 · ‎03-29-2016

I can duplicate the problem on 16.0.2 with centos7.2.

I'm may be seeing the same problem but with stdin on centos7 as strace looks almost exactly the same.

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/622937

Wiesław_L_ · ‎05-19-2016

This problem exists only if process has high PID.

Look at the post kgore4:

https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/622937#

Kevin_D_Intel · ‎07-07-2016

As noted in the earlier cited thread, this has been reproduced and directed to our run-time library team for further analysis/repair.

(Internal tracking id: DPD200585850)

jimdempseyatthecove · ‎07-07-2016

>>*** buffer overflow detected ***: ./a.out terminated

This was from ___sprintf_chk

If your Fortran program is calling sprint (directly or indirectly), and it is passing a format string, it may be that you forgot to append a NULL character to the format string (relying on uninitialized data to supply the null).

Jim Dempsey

Kevin_D_Intel · ‎07-12-2016

Development identified a workaround for this defect; hopefully it is usable.

Instead of using the redirection symbol ">", you can set an environment variable to direct the output to a file (or /dev/null).

So, instead of doing:

./ok-16.0.1 > ok.out

do this:

setenv FOR_PRINT ok.out (assuming the c shell; otherwise, it would be "export FOR_PRINT")
./ok-16.0.1

Using "setenv FOR_PRINT /dev/null" also works.

If you are redirecting stderr (unit=0) to a file, then use the env variable name FORT0.

info-hpc · ‎07-13-2016

Dear Kevin, dear all,

in our recently deployed cluster we are experiencing a similar issue with ifort 16.0.3 20160415 and CentOs 7.2.1511

We tried the workaround you suggested

setenv FOR_PRINT ok.out (assuming the c shell; otherwise, it would be "export FOR_PRINT")
./ok-16.0.1

and it worked for some use cases but not all of them. Setting max_pid to 999999 (sysctl -w kernel.pid_max=999999) seems a more general workaround, or at least fixed some use cases that still failed. Any thought on that? What is the opinion of Intel on that?

Thanks for the help and all the suggestions the this forum provides.

Regards,

CINECA User Support group

Kevin_D_Intel · ‎07-14-2016

I lack in-depth Linux kernel knowledge to comment much. The suggested setting should avoid the potential of a 7-digit PID; however, in a large cluster that might easily be exhausted with heavy usage. Maybe it is a more reasonable work around on Centos though given other's comments in this thread (and the other cited thread) seem to suggest Centos may start with high PIDs by default (which helped expose this defect).

It appears our fix for this will be our upcoming PSXE 2016 Update 4 release tentatively scheduled for late-August. I need to confirm, but for our upcoming major release later this year, PSXE 2017, it appears the fix will not make the initial release but will be in the first update. I’ll confirm.

Kevin_D_Intel · ‎07-26-2016

I confirmed the fix will be in the upcoming PSXE 2016 Update 4 release (late-August timeframe) and PSXE 2017 Update 1 (mid-Q4 '16 timeframe), and not the initial release.

redirecting stdout from ifort 16.0.1 compiled program results in core dump