Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Possible compiler bug?

Jacob_Williams
New Contributor III
3,644 Views

I'm seeing some weird behavior on certain Linux systems for code that works fine on Windows (and also on other Linux systems). The example code is attached (reproducer.f90).  It has a class that contains a function pointer that is being associated to a subroutine that is contained within another subroutine.  My understanding is that this is valid?

My system is: HP DL360 G6, Intel(R) Xeon(R) X5570, CentOS 6.  I'm using Intel 16.0.1 20151021.

Compile with: ifort -g -traceback reproducer.f90 -o reproducer

Running it crashes (Not sure why I'm not getting the line numbers. What do I have to do to get those?):

$ ./reproducer
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
reproducer         0000000000477705  Unknown               Unknown  Unknown
reproducer         00000000004754C7  Unknown               Unknown  Unknown
reproducer         0000000000444DB4  Unknown               Unknown  Unknown
reproducer         0000000000444BC6  Unknown               Unknown  Unknown
reproducer         0000000000425CC6  Unknown               Unknown  Unknown
reproducer         00000000004032B0  Unknown               Unknown  Unknown
libpthread.so.0    0000003A0460F790  Unknown               Unknown  Unknown
Unknown            00007FFE9F1AD658  Unknown               Unknown  Unknown

However, running it in a debugger, it works fine:

$ gdb-ia ./reproducer

(gdb) run
Starting program: reproducer
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
   3.00000000000000
[Inferior 1 (process 14138) exited normally]
(gdb)

Also interesting is that if I use "set disable-randomization off" in the debugger, it crashes again:

$ gdb-ia ./reproducer

(gdb) set disable-randomization off
(gdb) run
Starting program: reproducer
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Catchpoint -2 (signal SIGSEGV), 0x00007fff187ab458 in ?? ()

(gdb) where
#0  0x00007fff187ab458 in ?? ()
#1  0x00000000004030d5 in my_module::my_test () at reproducer.f90:42
#2  0x0000000000403175 in test () at reproducer.f90:71
#3  0x0000000000402e1e in main ()
#4  0x0000003a03a1ed5d in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000402d29 in _start ()
(gdb)

Any ideas? I think the code is valid, so maybe it's a compiler bug? However, since it does work on another similar Linux systems, I'm wondering if it could be some system-specific setting, but I don't know what would cause such behavior. (I normally work with the Windows compiler, so maybe more Linux-savvy folks can point me in the right direction).

0 Kudos
24 Replies
jimdempseyatthecove
Honored Contributor III
2,969 Views

I think the issue is that a contained procedure must not be called from outside the procedure in which it is contained. This would be a requirement for the contained procedure to access variables in the procedure which contains it (proper stack framing).

I notice that your subroutine f does not reference variables within the containing procedure. Steve will be able to comment on this further, but you might be able to get by with it by making f PURE, though this may not work due to the contained pure subroutine potentially accessing the variables in the containing scope (not present when called via the pointer). I think it be best to extract the my_test contained subroutine f and place it in the outer scope contains section of the module, making it private (and potentially renaming it to my_test_f).

There was a different thread on one of the forums indicating that you cannot pass the address (pointer) of a private variable/procedure out of the module. In this case (after relocating f) you are not technically passing f out (excepting that the pointer f in instance of my_type is accessible).

Therefor the subroutine will have to be public (I assume).

This oop stuff is well beyond my expertise. I think I am more of an oops programmer.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

I don't see a problem with the source code. As long as you haven't left the instance of the containing procedure, it's ok to store and use a pointer to the contained procedure.

I am at home right now and tried this on Windows - it ran fine. I will try on Linux tomorrow as I think there the method of calling contained procedures is different.

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

I can't reproduce it on my Linux either, but as you note it works on some Linux systems and not others. Interesting point about ASLR, though that's supposed to change things run-to-run only.

It would be interesting to see a gdb instruction trace (with ASLR enabled) of the call.

set disable-randomization off
break 42
run
display/i
stepi

and then repeat the stepi until you get to:

  callq  *%rax

print/x $rax

and then do more stepi until it fails. Paste the output here.

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views

The display/i produces the message "Argument required (expression to compute)", and I never see the "callq ...".  But this is what I get:

$ gdb-ia ./reproducer

No symbol table is loaded.  Use the "file" command.
GNU gdb (GDB) 7.8-16.0.558
Copyright (C) 2014 Free Software Foundation, Inc; (C) 2013-2015 Intel Corp.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For information about how to find Technical Support, Product Updates,
User Forums, FAQs, tips and tricks, and other support information, please visit:
<http://www.intel.com/software/products/support/>.For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./reproducer...done.
(gdb) set disable-randomization off
(gdb) break 42
Breakpoint 1 at 0x402fb4: file reproducer.f90, line 42.
(gdb) run
Starting program: reproducer
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, my_module::my_test () at reproducer.f90:42
42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) display/i
Argument required (expression to compute).
(gdb) stepi
0x0000000000402fbf      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402fca      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402fd5      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402fe0      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402fe7      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402fee      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402ff5      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000402ff9      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403000      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040300b      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403010      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403017      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040301c      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403023      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040302e      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403035      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403039      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403040      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040304b      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403056      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403061      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040306c      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403071      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403078      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403080      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403087      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040308d      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x0000000000403094      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040309b      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x000000000040309f      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030a6      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030ad      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030b4      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030b9      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030be      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030c2      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030c5      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030c9      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030cc      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030cf      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00000000004030d3      42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) stepi
0x00007ffe066d9fd8 in ?? ()
(gdb) stepi

Catchpoint -2 (signal SIGSEGV), 0x00007ffe066d9fd8 in ?? ()
(gdb) print/x $rax
$1 = 0x7ffe066d9fd8
(gdb) 
0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Sorry, I meant "display/i $pc". But you showed enough to be interesting. That 7ffe... location is stack. When I run this I get this:

=> 0x4030d3 <__my_module_MOD_my_test+637>:      callq  *%rax
(gdb) print/x $rax
$1 = 0x7fffffffd6d8
(gdb) stepi
0x00007fffffffd6d8 in ?? ()
1: x/i $pc
=> 0x7fffffffd6d8:      movabs $0x403129,%r11
(gdb) 
0x00007fffffffd6e2 in ?? ()
1: x/i $pc
=> 0x7fffffffd6e2:      movabs $0x0,%r10
(gdb) 
0x00007fffffffd6ec in ?? ()
1: x/i $pc
=> 0x7fffffffd6ec:      rex.WB jmpq *%r11
(gdb) 
my_module::f (me=0x3e0e18ced8 <main_arena+88>, 
    a=<error reading variable: Cannot access memory at address 0x2567260c00000000>,
    b=<error reading variable: Cannot access memory at address 0x0>, 
    c=<error reading variable: Cannot access memory at address 0x2567260c64a422ac>) at reproducer.f90:48
48                  subroutine f(me,a,b,c)
1: x/i $pc
=> 0x403129 <__my_module_MOD_f>:        push   %rbp

So you see it calls first into a stack "thunk" and which sets up the context and jumps to the real function. (The "error reading variable" messages are spurious and can be ignored.)

The address printed seems reasonable - but why you get a segfault there, I don't know. It could be there's something setting the stack as nonexecutable - I know that can be an issue. But why some Linux systems work and others don't, I don't know. I need to find someone here who understands this better than I do (not hard). 

Can you tell me the exact Linux you have installed? uname -a

 

 

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views
$ uname -a
Linux ngc11 2.6.32-573.8.1.el6.x86_64 #1 SMP Tue Nov 10 18:01:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

 

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views

For completness, here is what I get when the run with the right "display/i $pc" command:

$ gdb-ia ./reproducer

No symbol table is loaded.  Use the "file" command.
GNU gdb (GDB) 7.8-16.0.558
Copyright (C) 2014 Free Software Foundation, Inc; (C) 2013-2015 Intel Corp.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For information about how to find Technical Support, Product Updates,
User Forums, FAQs, tips and tricks, and other support information, please visit:
<http://www.intel.com/software/products/support/>.For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./reproducer...done.
(gdb) set disable-randomization off
(gdb) break 42
Breakpoint 1 at 0x402fb4: file ./reproducer.f90, line 42.
(gdb) run
Starting program: reproducer
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, my_module::my_test () at ./reproducer.f90:42
42                      call blah%f(1.0_wp, 2.0_wp, xp)
(gdb) display/i $pc
1: x/i $pc
=> 0x402fb4 <__my_module_MOD_my_test+350>:      movq   $0x0,-0xd8(%rbp)
(gdb) stepi
0x0000000000402fbf      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402fbf <__my_module_MOD_my_test+361>:      movq   $0x8,-0xe8(%rbp)
(gdb) stepi
0x0000000000402fca      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402fca <__my_module_MOD_my_test+372>:      movq   $0x0,-0xd0(%rbp)
(gdb) stepi
0x0000000000402fd5      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402fd5 <__my_module_MOD_my_test+383>:      movq   $0x0,-0xe0(%rbp)
(gdb) stepi
0x0000000000402fe0      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402fe0 <__my_module_MOD_my_test+394>:      lea    -0x190(%rbp),%rax
(gdb) stepi
0x0000000000402fe7      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402fe7 <__my_module_MOD_my_test+401>:      mov    %rax,-0xf0(%rbp)
(gdb) stepi
0x0000000000402fee      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402fee <__my_module_MOD_my_test+408>:      mov    -0xd8(%rbp),%rax
(gdb) stepi
0x0000000000402ff5      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402ff5 <__my_module_MOD_my_test+415>:      or     $0x1,%rax
(gdb) stepi
0x0000000000402ff9      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x402ff9 <__my_module_MOD_my_test+419>:      mov    %rax,-0xd8(%rbp)
(gdb) stepi
0x0000000000403000      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403000 <__my_module_MOD_my_test+426>:      movq   $0x0,-0xe0(%rbp)
(gdb) stepi
0x000000000040300b      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40300b <__my_module_MOD_my_test+437>:      mov    $0x4921d0,%eax
(gdb) stepi
0x0000000000403010      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403010 <__my_module_MOD_my_test+442>:      mov    %rax,-0xc0(%rbp)
(gdb) stepi
0x0000000000403017      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403017 <__my_module_MOD_my_test+449>:      mov    $0x492218,%eax
(gdb) stepi
0x000000000040301c      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40301c <__my_module_MOD_my_test+454>:      mov    %rax,-0xb8(%rbp)
(gdb) stepi
0x0000000000403023      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403023 <__my_module_MOD_my_test+461>:      movq   $0x0,-0xb0(%rbp)
(gdb) stepi
0x000000000040302e      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40302e <__my_module_MOD_my_test+472>:      mov    -0xd8(%rbp),%rax
(gdb) stepi
0x0000000000403035      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403035 <__my_module_MOD_my_test+479>:      or     $0x2,%rax
(gdb) stepi
0x0000000000403039      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403039 <__my_module_MOD_my_test+483>:      mov    %rax,-0xd8(%rbp)
(gdb) stepi
0x0000000000403040      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403040 <__my_module_MOD_my_test+490>:      movq   $0x0,-0xa0(%rbp)
(gdb) stepi
0x000000000040304b      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40304b <__my_module_MOD_my_test+501>:      movq   $0x0,-0x90(%rbp)
(gdb) stepi
0x0000000000403056      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403056 <__my_module_MOD_my_test+512>:      movq   $0x0,-0x98(%rbp)
(gdb) stepi
0x0000000000403061      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403061 <__my_module_MOD_my_test+523>:      movq   $0x0,-0xa8(%rbp)
(gdb) stepi
0x000000000040306c      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40306c <__my_module_MOD_my_test+534>:      mov    $0x4921e0,%eax
(gdb) stepi
0x0000000000403071      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403071 <__my_module_MOD_my_test+539>:      mov    %rax,-0x88(%rbp)
(gdb) stepi
0x0000000000403078      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403078 <__my_module_MOD_my_test+546>:      movq   $0x0,-0x80(%rbp)
(gdb) stepi
0x0000000000403080      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403080 <__my_module_MOD_my_test+554>:      mov    -0xd8(%rbp),%rax
(gdb) stepi
0x0000000000403087      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403087 <__my_module_MOD_my_test+561>:      and    $0xffffffffffffff7f,%rax
(gdb) stepi
0x000000000040308d      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40308d <__my_module_MOD_my_test+567>:      mov    %rax,-0xd8(%rbp)
(gdb) stepi
0x0000000000403094      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x403094 <__my_module_MOD_my_test+574>:      mov    -0xd8(%rbp),%rax
(gdb) stepi
0x000000000040309b      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40309b <__my_module_MOD_my_test+581>:      or     $0x2,%rax
(gdb) stepi
0x000000000040309f      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x40309f <__my_module_MOD_my_test+585>:      mov    %rax,-0xd8(%rbp)
(gdb) stepi
0x00000000004030a6      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030a6 <__my_module_MOD_my_test+592>:      mov    -0x190(%rbp),%rax
(gdb) stepi
0x00000000004030ad      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030ad <__my_module_MOD_my_test+599>:      lea    -0xf0(%rbp),%rdx
(gdb) stepi
0x00000000004030b4      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030b4 <__my_module_MOD_my_test+606>:      mov    $0x492220,%ecx
(gdb) stepi
0x00000000004030b9      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030b9 <__my_module_MOD_my_test+611>:      mov    $0x492228,%ebx
(gdb) stepi
0x00000000004030be      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030be <__my_module_MOD_my_test+616>:      lea    -0x70(%rbp),%rsi
(gdb) stepi
0x00000000004030c2      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030c2 <__my_module_MOD_my_test+620>:      mov    %rdx,%rdi
(gdb) stepi
0x00000000004030c5      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030c5 <__my_module_MOD_my_test+623>:      mov    %rsi,-0x20(%rbp)
(gdb) stepi
0x00000000004030c9      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030c9 <__my_module_MOD_my_test+627>:      mov    %rcx,%rsi
(gdb) stepi
0x00000000004030cc      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030cc <__my_module_MOD_my_test+630>:      mov    %rbx,%rdx
(gdb) stepi
0x00000000004030cf      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030cf <__my_module_MOD_my_test+633>:      mov    -0x20(%rbp),%rcx
(gdb) stepi
0x00000000004030d3      42                      call blah%f(1.0_wp, 2.0_wp, xp)
1: x/i $pc
=> 0x4030d3 <__my_module_MOD_my_test+637>:      callq  *%rax
(gdb) print/x $rax
$1 = 0x7fff5bd576d8
(gdb) stepi
0x00007fff5bd576d8 in ?? ()
1: x/i $pc
=> 0x7fff5bd576d8:      movabs $0x403129,%r11
(gdb) stepi

Catchpoint -2 (signal SIGSEGV), 0x00007fff5bd576d8 in ?? ()
1: x/i $pc
=> 0x7fff5bd576d8:      movabs $0x403129,%r11
(gdb)

 

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Thanks - that tells me at least that the "thunk" is at least being branched to. My guess is that it's some sort of execution protection. Let me see what I can find out. Are the other Linux systems also EL6? (Mine is also EL6, but a slightly older kernel.)

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Oh, here's another question. If you change "initialize" to call the passed f there, rather then setting the pointer, does it work?

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Oh, and on the original program, what happens if you do:

execstack -s ./reproducer

and then run the program (not in gdb)?

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views

It still crashes. Code attached.

$ gdb-ia ./reproducer2

(gdb) set disable-randomization off
(gdb) run
Starting program: reproducer2 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Catchpoint -2 (signal SIGSEGV), 0x00007ffd924b71c8 in ?? ()
(gdb) where
#0  0x00007ffd924b71c8 in ?? ()
#1  0x0000000000402e76 in my_module::initialize (me=0x7ffd924b7140) at ./reproducer2.f90:34
#2  0x0000000000402fd9 in my_module::my_test () at ./reproducer2.f90:45
#3  0x0000000000403027 in test () at ./reproducer2.f90:76
#4  0x0000000000402e1e in main ()
#5  0x0000003a03a1ed5d in __libc_start_main () from /lib64/libc.so.6
#6  0x0000000000402d29 in _start ()
(gdb) 

 

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Ok.

Also I would like to see the results of:

execstack -q ./reproducer

on the system where it fails and on a system where it works.

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views

System where it works:

$ execstack -s ./reproducer
$ ./reproducer
   3.00000000000000     
$ 

System where it doesn't work:

$ execstack -s ./reproducer
$ ./reproducer
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
reproducer         0000000000477705  Unknown               Unknown  Unknown
reproducer         00000000004754C7  Unknown               Unknown  Unknown
reproducer         0000000000444DB4  Unknown               Unknown  Unknown
reproducer         0000000000444BC6  Unknown               Unknown  Unknown
reproducer         0000000000425CC6  Unknown               Unknown  Unknown
reproducer         00000000004032B0  Unknown               Unknown  Unknown
libpthread.so.0    0000003A0460F790  Unknown               Unknown  Unknown
Unknown            00007FFF672C89D8  Unknown               Unknown  Unknown
$ 

 

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Another test - if you copy the executable from the system where it works to the one where it doesn't, does it run ok?

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views

It's a cluster environment, so my home directory is shared on all the systems. I compile on the system that works and then can just ssh to the other one and run it from there. Compiling on one system vs the other doesn't seem to make a difference either.

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

So the same executable that works on one system fails on the other? What does execstack -q ./reproducer say on the two systems? It sure looks as if the issue is with the failing system overriding the flag in the executable saying to allow stack code to execute.

0 Kudos
Jacob_Williams
New Contributor III
2,969 Views

Both of them say:

X ./reproducer

 

0 Kudos
Steven_L_Intel1
Employee
2,969 Views

Ok. Then there's something on the failing system overriding this - it's not in the compiler or linker. I'm not sure where to go from here.

0 Kudos
jimdempseyatthecove
Honored Contributor III
2,969 Views

I notice that the crash occurred relating to libpthread.so.0 In some of my earlier experiences with pthread much of the documentation that you find on line was written for 32-bit systems and freely substituted "unsigned" for arguments that required an analog for handle which on 64-bit platform is 64-bits (e.g. uintptr_t). Using "unsigned" will (generally does, but sometimes not) cause failing code on 64-bit platforms.

Check your pthread return values and arguments, assuming you directly call pthread support routines. If you are not directly calling pthread routines (letting compiler generated OpenMP callse) then you may have a compatibility issue with the installed pthread library and/or openmp library.

Jim Dempsey

0 Kudos
Steven_L_Intel1
Employee
2,756 Views

This really has nothing to do with threading, On the failing system, the kernel is not honoring the "ExecuteStack" flag in the ELF executable. Given that, calling contained procedures by passing them or storing pointers will not work.

0 Kudos
Reply