- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm seeing some weird behavior on certain Linux systems for code that works fine on Windows (and also on other Linux systems). The example code is attached (reproducer.f90). It has a class that contains a function pointer that is being associated to a subroutine that is contained within another subroutine. My understanding is that this is valid?
My system is: HP DL360 G6, Intel(R) Xeon(R) X5570, CentOS 6. I'm using Intel 16.0.1 20151021.
Compile with: ifort -g -traceback reproducer.f90 -o reproducer
Running it crashes (Not sure why I'm not getting the line numbers. What do I have to do to get those?):
$ ./reproducer forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source reproducer 0000000000477705 Unknown Unknown Unknown reproducer 00000000004754C7 Unknown Unknown Unknown reproducer 0000000000444DB4 Unknown Unknown Unknown reproducer 0000000000444BC6 Unknown Unknown Unknown reproducer 0000000000425CC6 Unknown Unknown Unknown reproducer 00000000004032B0 Unknown Unknown Unknown libpthread.so.0 0000003A0460F790 Unknown Unknown Unknown Unknown 00007FFE9F1AD658 Unknown Unknown Unknown
However, running it in a debugger, it works fine:
$ gdb-ia ./reproducer (gdb) run Starting program: reproducer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 3.00000000000000 [Inferior 1 (process 14138) exited normally] (gdb)
Also interesting is that if I use "set disable-randomization off" in the debugger, it crashes again:
$ gdb-ia ./reproducer (gdb) set disable-randomization off (gdb) run Starting program: reproducer [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Catchpoint -2 (signal SIGSEGV), 0x00007fff187ab458 in ?? () (gdb) where #0 0x00007fff187ab458 in ?? () #1 0x00000000004030d5 in my_module::my_test () at reproducer.f90:42 #2 0x0000000000403175 in test () at reproducer.f90:71 #3 0x0000000000402e1e in main () #4 0x0000003a03a1ed5d in __libc_start_main () from /lib64/libc.so.6 #5 0x0000000000402d29 in _start () (gdb)
Any ideas? I think the code is valid, so maybe it's a compiler bug? However, since it does work on another similar Linux systems, I'm wondering if it could be some system-specific setting, but I don't know what would cause such behavior. (I normally work with the Windows compiler, so maybe more Linux-savvy folks can point me in the right direction).
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe some sort of "security" feature on the system perhaps? I'll check with my IT guy and report back if we figure out anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's what I'm thinking. Since you can take the executable from a system where it works and run it on another system where it fails, that tells me it isn't the compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is exec-shield causing this.
This was definitely due to us having the sysctl flag kernel.exec-shield=3. This flag sets the XD (execute disabled) bit on the processor (generically this is called the NX (no-execute)).
We ignored this as a problem initially because our old systems with Xeon 5300 processors had it set this way too, however I do not think Linux recognizes this feature on this processor. So the problem only exhibited itself on newer Xeon 5400 and 5500 processors.
We will set kernel.exec-shield=1 as a workaround. I suggest your compiler team gets with those on the processor team who added the XD bit and come up with a solution that does not make your compiled programs behave like malicious code would.
Happy Holidays
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I thought it was something like that. gcc has a similar issue with what it calls "trampolines". Interestingly, I had thought we did have a method that avoided the issue of executing stack code, but it seems not.
Thanks for getting back to us with the resolution - but I can see that this is a potential issue going forward and will see what I can do to raise the visibility. Maybe we can come up with something else (though it will probably be slower.)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »