- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm seeing some weird behavior on certain Linux systems for code that works fine on Windows (and also on other Linux systems). The example code is attached (reproducer.f90). It has a class that contains a function pointer that is being associated to a subroutine that is contained within another subroutine. My understanding is that this is valid?
My system is: HP DL360 G6, Intel(R) Xeon(R) X5570, CentOS 6. I'm using Intel 16.0.1 20151021.
Compile with: ifort -g -traceback reproducer.f90 -o reproducer
Running it crashes (Not sure why I'm not getting the line numbers. What do I have to do to get those?):
$ ./reproducer forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source reproducer 0000000000477705 Unknown Unknown Unknown reproducer 00000000004754C7 Unknown Unknown Unknown reproducer 0000000000444DB4 Unknown Unknown Unknown reproducer 0000000000444BC6 Unknown Unknown Unknown reproducer 0000000000425CC6 Unknown Unknown Unknown reproducer 00000000004032B0 Unknown Unknown Unknown libpthread.so.0 0000003A0460F790 Unknown Unknown Unknown Unknown 00007FFE9F1AD658 Unknown Unknown Unknown
However, running it in a debugger, it works fine:
$ gdb-ia ./reproducer (gdb) run Starting program: reproducer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 3.00000000000000 [Inferior 1 (process 14138) exited normally] (gdb)
Also interesting is that if I use "set disable-randomization off" in the debugger, it crashes again:
$ gdb-ia ./reproducer (gdb) set disable-randomization off (gdb) run Starting program: reproducer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Catchpoint -2 (signal SIGSEGV), 0x00007fff187ab458 in ?? () (gdb) where #0 0x00007fff187ab458 in ?? () #1 0x00000000004030d5 in my_module::my_test () at reproducer.f90:42 #2 0x0000000000403175 in test () at reproducer.f90:71 #3 0x0000000000402e1e in main () #4 0x0000003a03a1ed5d in __libc_start_main () from /lib64/libc.so.6 #5 0x0000000000402d29 in _start () (gdb)
Any ideas? I think the code is valid, so maybe it's a compiler bug? However, since it does work on another similar Linux systems, I'm wondering if it could be some system-specific setting, but I don't know what would cause such behavior. (I normally work with the Windows compiler, so maybe more Linux-savvy folks can point me in the right direction).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the issue is that a contained procedure must not be called from outside the procedure in which it is contained. This would be a requirement for the contained procedure to access variables in the procedure which contains it (proper stack framing).
I notice that your subroutine f does not reference variables within the containing procedure. Steve will be able to comment on this further, but you might be able to get by with it by making f PURE, though this may not work due to the contained pure subroutine potentially accessing the variables in the containing scope (not present when called via the pointer). I think it be best to extract the my_test contained subroutine f and place it in the outer scope contains section of the module, making it private (and potentially renaming it to my_test_f).
There was a different thread on one of the forums indicating that you cannot pass the address (pointer) of a private variable/procedure out of the module. In this case (after relocating f) you are not technically passing f out (excepting that the pointer f in instance of my_type is accessible).
Therefor the subroutine will have to be public (I assume).
This oop stuff is well beyond my expertise. I think I am more of an oops programmer.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see a problem with the source code. As long as you haven't left the instance of the containing procedure, it's ok to store and use a pointer to the contained procedure.
I am at home right now and tried this on Windows - it ran fine. I will try on Linux tomorrow as I think there the method of calling contained procedures is different.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't reproduce it on my Linux either, but as you note it works on some Linux systems and not others. Interesting point about ASLR, though that's supposed to change things run-to-run only.
It would be interesting to see a gdb instruction trace (with ASLR enabled) of the call.
set disable-randomization off
break 42
run
display/i
stepi
and then repeat the stepi until you get to:
callq *%rax
print/x $rax
and then do more stepi until it fails. Paste the output here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The display/i produces the message "Argument required (expression to compute)", and I never see the "callq ...". But this is what I get:
$ gdb-ia ./reproducer No symbol table is loaded. Use the "file" command. GNU gdb (GDB) 7.8-16.0.558 Copyright (C) 2014 Free Software Foundation, Inc; (C) 2013-2015 Intel Corp. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". Type "show configuration" for configuration details. For information about how to find Technical Support, Product Updates, User Forums, FAQs, tips and tricks, and other support information, please visit: <http://www.intel.com/software/products/support/>.For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./reproducer...done. (gdb) set disable-randomization off (gdb) break 42 Breakpoint 1 at 0x402fb4: file reproducer.f90, line 42. (gdb) run Starting program: reproducer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, my_module::my_test () at reproducer.f90:42 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) display/i Argument required (expression to compute). (gdb) stepi 0x0000000000402fbf 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402fca 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402fd5 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402fe0 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402fe7 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402fee 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402ff5 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000402ff9 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403000 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040300b 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403010 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403017 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040301c 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403023 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040302e 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403035 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403039 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403040 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040304b 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403056 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403061 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040306c 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403071 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403078 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403080 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403087 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040308d 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x0000000000403094 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040309b 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x000000000040309f 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030a6 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030ad 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030b4 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030b9 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030be 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030c2 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030c5 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030c9 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030cc 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030cf 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00000000004030d3 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) stepi 0x00007ffe066d9fd8 in ?? () (gdb) stepi Catchpoint -2 (signal SIGSEGV), 0x00007ffe066d9fd8 in ?? () (gdb) print/x $rax $1 = 0x7ffe066d9fd8 (gdb)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I meant "display/i $pc". But you showed enough to be interesting. That 7ffe... location is stack. When I run this I get this:
=> 0x4030d3 <__my_module_MOD_my_test+637>: callq *%rax
(gdb) print/x $rax
$1 = 0x7fffffffd6d8
(gdb) stepi
0x00007fffffffd6d8 in ?? ()
1: x/i $pc
=> 0x7fffffffd6d8: movabs $0x403129,%r11
(gdb)
0x00007fffffffd6e2 in ?? ()
1: x/i $pc
=> 0x7fffffffd6e2: movabs $0x0,%r10
(gdb)
0x00007fffffffd6ec in ?? ()
1: x/i $pc
=> 0x7fffffffd6ec: rex.WB jmpq *%r11
(gdb)
my_module::f (me=0x3e0e18ced8 <main_arena+88>,
a=<error reading variable: Cannot access memory at address 0x2567260c00000000>,
b=<error reading variable: Cannot access memory at address 0x0>,
c=<error reading variable: Cannot access memory at address 0x2567260c64a422ac>) at reproducer.f90:48
48 subroutine f(me,a,b,c)
1: x/i $pc
=> 0x403129 <__my_module_MOD_f>: push %rbp
So you see it calls first into a stack "thunk" and which sets up the context and jumps to the real function. (The "error reading variable" messages are spurious and can be ignored.)
The address printed seems reasonable - but why you get a segfault there, I don't know. It could be there's something setting the stack as nonexecutable - I know that can be an issue. But why some Linux systems work and others don't, I don't know. I need to find someone here who understands this better than I do (not hard).
Can you tell me the exact Linux you have installed? uname -a
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
$ uname -a Linux ngc11 2.6.32-573.8.1.el6.x86_64 #1 SMP Tue Nov 10 18:01:38 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For completness, here is what I get when the run with the right "display/i $pc" command:
$ gdb-ia ./reproducer No symbol table is loaded. Use the "file" command. GNU gdb (GDB) 7.8-16.0.558 Copyright (C) 2014 Free Software Foundation, Inc; (C) 2013-2015 Intel Corp. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu". Type "show configuration" for configuration details. For information about how to find Technical Support, Product Updates, User Forums, FAQs, tips and tricks, and other support information, please visit: <http://www.intel.com/software/products/support/>.For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from ./reproducer...done. (gdb) set disable-randomization off (gdb) break 42 Breakpoint 1 at 0x402fb4: file ./reproducer.f90, line 42. (gdb) run Starting program: reproducer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Breakpoint 1, my_module::my_test () at ./reproducer.f90:42 42 call blah%f(1.0_wp, 2.0_wp, xp) (gdb) display/i $pc 1: x/i $pc => 0x402fb4 <__my_module_MOD_my_test+350>: movq $0x0,-0xd8(%rbp) (gdb) stepi 0x0000000000402fbf 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402fbf <__my_module_MOD_my_test+361>: movq $0x8,-0xe8(%rbp) (gdb) stepi 0x0000000000402fca 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402fca <__my_module_MOD_my_test+372>: movq $0x0,-0xd0(%rbp) (gdb) stepi 0x0000000000402fd5 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402fd5 <__my_module_MOD_my_test+383>: movq $0x0,-0xe0(%rbp) (gdb) stepi 0x0000000000402fe0 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402fe0 <__my_module_MOD_my_test+394>: lea -0x190(%rbp),%rax (gdb) stepi 0x0000000000402fe7 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402fe7 <__my_module_MOD_my_test+401>: mov %rax,-0xf0(%rbp) (gdb) stepi 0x0000000000402fee 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402fee <__my_module_MOD_my_test+408>: mov -0xd8(%rbp),%rax (gdb) stepi 0x0000000000402ff5 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402ff5 <__my_module_MOD_my_test+415>: or $0x1,%rax (gdb) stepi 0x0000000000402ff9 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x402ff9 <__my_module_MOD_my_test+419>: mov %rax,-0xd8(%rbp) (gdb) stepi 0x0000000000403000 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403000 <__my_module_MOD_my_test+426>: movq $0x0,-0xe0(%rbp) (gdb) stepi 0x000000000040300b 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40300b <__my_module_MOD_my_test+437>: mov $0x4921d0,%eax (gdb) stepi 0x0000000000403010 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403010 <__my_module_MOD_my_test+442>: mov %rax,-0xc0(%rbp) (gdb) stepi 0x0000000000403017 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403017 <__my_module_MOD_my_test+449>: mov $0x492218,%eax (gdb) stepi 0x000000000040301c 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40301c <__my_module_MOD_my_test+454>: mov %rax,-0xb8(%rbp) (gdb) stepi 0x0000000000403023 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403023 <__my_module_MOD_my_test+461>: movq $0x0,-0xb0(%rbp) (gdb) stepi 0x000000000040302e 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40302e <__my_module_MOD_my_test+472>: mov -0xd8(%rbp),%rax (gdb) stepi 0x0000000000403035 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403035 <__my_module_MOD_my_test+479>: or $0x2,%rax (gdb) stepi 0x0000000000403039 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403039 <__my_module_MOD_my_test+483>: mov %rax,-0xd8(%rbp) (gdb) stepi 0x0000000000403040 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403040 <__my_module_MOD_my_test+490>: movq $0x0,-0xa0(%rbp) (gdb) stepi 0x000000000040304b 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40304b <__my_module_MOD_my_test+501>: movq $0x0,-0x90(%rbp) (gdb) stepi 0x0000000000403056 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403056 <__my_module_MOD_my_test+512>: movq $0x0,-0x98(%rbp) (gdb) stepi 0x0000000000403061 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403061 <__my_module_MOD_my_test+523>: movq $0x0,-0xa8(%rbp) (gdb) stepi 0x000000000040306c 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40306c <__my_module_MOD_my_test+534>: mov $0x4921e0,%eax (gdb) stepi 0x0000000000403071 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403071 <__my_module_MOD_my_test+539>: mov %rax,-0x88(%rbp) (gdb) stepi 0x0000000000403078 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403078 <__my_module_MOD_my_test+546>: movq $0x0,-0x80(%rbp) (gdb) stepi 0x0000000000403080 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403080 <__my_module_MOD_my_test+554>: mov -0xd8(%rbp),%rax (gdb) stepi 0x0000000000403087 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403087 <__my_module_MOD_my_test+561>: and $0xffffffffffffff7f,%rax (gdb) stepi 0x000000000040308d 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40308d <__my_module_MOD_my_test+567>: mov %rax,-0xd8(%rbp) (gdb) stepi 0x0000000000403094 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x403094 <__my_module_MOD_my_test+574>: mov -0xd8(%rbp),%rax (gdb) stepi 0x000000000040309b 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40309b <__my_module_MOD_my_test+581>: or $0x2,%rax (gdb) stepi 0x000000000040309f 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x40309f <__my_module_MOD_my_test+585>: mov %rax,-0xd8(%rbp) (gdb) stepi 0x00000000004030a6 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030a6 <__my_module_MOD_my_test+592>: mov -0x190(%rbp),%rax (gdb) stepi 0x00000000004030ad 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030ad <__my_module_MOD_my_test+599>: lea -0xf0(%rbp),%rdx (gdb) stepi 0x00000000004030b4 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030b4 <__my_module_MOD_my_test+606>: mov $0x492220,%ecx (gdb) stepi 0x00000000004030b9 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030b9 <__my_module_MOD_my_test+611>: mov $0x492228,%ebx (gdb) stepi 0x00000000004030be 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030be <__my_module_MOD_my_test+616>: lea -0x70(%rbp),%rsi (gdb) stepi 0x00000000004030c2 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030c2 <__my_module_MOD_my_test+620>: mov %rdx,%rdi (gdb) stepi 0x00000000004030c5 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030c5 <__my_module_MOD_my_test+623>: mov %rsi,-0x20(%rbp) (gdb) stepi 0x00000000004030c9 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030c9 <__my_module_MOD_my_test+627>: mov %rcx,%rsi (gdb) stepi 0x00000000004030cc 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030cc <__my_module_MOD_my_test+630>: mov %rbx,%rdx (gdb) stepi 0x00000000004030cf 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030cf <__my_module_MOD_my_test+633>: mov -0x20(%rbp),%rcx (gdb) stepi 0x00000000004030d3 42 call blah%f(1.0_wp, 2.0_wp, xp) 1: x/i $pc => 0x4030d3 <__my_module_MOD_my_test+637>: callq *%rax (gdb) print/x $rax $1 = 0x7fff5bd576d8 (gdb) stepi 0x00007fff5bd576d8 in ?? () 1: x/i $pc => 0x7fff5bd576d8: movabs $0x403129,%r11 (gdb) stepi Catchpoint -2 (signal SIGSEGV), 0x00007fff5bd576d8 in ?? () 1: x/i $pc => 0x7fff5bd576d8: movabs $0x403129,%r11 (gdb)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks - that tells me at least that the "thunk" is at least being branched to. My guess is that it's some sort of execution protection. Let me see what I can find out. Are the other Linux systems also EL6? (Mine is also EL6, but a slightly older kernel.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh, here's another question. If you change "initialize" to call the passed f there, rather then setting the pointer, does it work?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh, and on the original program, what happens if you do:
execstack -s ./reproducer
and then run the program (not in gdb)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It still crashes. Code attached.
$ gdb-ia ./reproducer2 (gdb) set disable-randomization off (gdb) run Starting program: reproducer2 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Catchpoint -2 (signal SIGSEGV), 0x00007ffd924b71c8 in ?? () (gdb) where #0 0x00007ffd924b71c8 in ?? () #1 0x0000000000402e76 in my_module::initialize (me=0x7ffd924b7140) at ./reproducer2.f90:34 #2 0x0000000000402fd9 in my_module::my_test () at ./reproducer2.f90:45 #3 0x0000000000403027 in test () at ./reproducer2.f90:76 #4 0x0000000000402e1e in main () #5 0x0000003a03a1ed5d in __libc_start_main () from /lib64/libc.so.6 #6 0x0000000000402d29 in _start () (gdb)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok.
Also I would like to see the results of:
execstack -q ./reproducer
on the system where it fails and on a system where it works.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
System where it works:
$ execstack -s ./reproducer $ ./reproducer 3.00000000000000 $
System where it doesn't work:
$ execstack -s ./reproducer $ ./reproducer forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source reproducer 0000000000477705 Unknown Unknown Unknown reproducer 00000000004754C7 Unknown Unknown Unknown reproducer 0000000000444DB4 Unknown Unknown Unknown reproducer 0000000000444BC6 Unknown Unknown Unknown reproducer 0000000000425CC6 Unknown Unknown Unknown reproducer 00000000004032B0 Unknown Unknown Unknown libpthread.so.0 0000003A0460F790 Unknown Unknown Unknown Unknown 00007FFF672C89D8 Unknown Unknown Unknown $
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Another test - if you copy the executable from the system where it works to the one where it doesn't, does it run ok?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's a cluster environment, so my home directory is shared on all the systems. I compile on the system that works and then can just ssh to the other one and run it from there. Compiling on one system vs the other doesn't seem to make a difference either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So the same executable that works on one system fails on the other? What does execstack -q ./reproducer say on the two systems? It sure looks as if the issue is with the failing system overriding the flag in the executable saying to allow stack code to execute.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Both of them say:
X ./reproducer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Then there's something on the failing system overriding this - it's not in the compiler or linker. I'm not sure where to go from here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I notice that the crash occurred relating to libpthread.so.0 In some of my earlier experiences with pthread much of the documentation that you find on line was written for 32-bit systems and freely substituted "unsigned" for arguments that required an analog for handle which on 64-bit platform is 64-bits (e.g. uintptr_t). Using "unsigned" will (generally does, but sometimes not) cause failing code on 64-bit platforms.
Check your pthread return values and arguments, assuming you directly call pthread support routines. If you are not directly calling pthread routines (letting compiler generated OpenMP callse) then you may have a compatibility issue with the installed pthread library and/or openmp library.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This really has nothing to do with threading, On the failing system, the kernel is not honoring the "ExecuteStack" flag in the ELF executable. Given that, calling contained procedures by passing them or storing pointers will not work.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page