Solved: Jim's memory is correct. The

Eric_O_ · ‎05-25-2015

It appears that the compiler option -mfpmath=387 immediately causes floating point exceptions on Intel architecture processors in cilk_spawn routines compiled with gcc-5.1 cilk. This seems to be a regression as gcc-4.9 with the cilk patches works fine. Note that -mfpmath=sse works on 64-bit machines, however, this option is not available for 32-bit Intel machines. As far as I can tell, most floating point code is affected. Does anyone know of patches or workarounds for this, especially as this appears to be a show-stopper on 32-bit Intel.

Hansang_B_Intel · ‎05-29-2015

I did further investigation based on Jim's thoughts and found the followings.

1. "gcc-5-branch" (5.x mainline) is building libcilkrts with "config/x86" (correct one)

2. The binary compiled with "gcc-5-branch" calls __cilkrts_save_fp_ctrl_state, whereas the binary compiled with "gcc-cilkplus" does not.

3. After dead-coding sysdep_save_fp_ctrl_state/restore_x86_fp_state for "gcc-5-branch", the program runs fine (Eric, could you please check if you can reproduce this?)

My impression is that those (save/restore) functions may need some modification for the flag if possible (Jim's last thought/suggestion).

View solution in original post

Eric_O_ · ‎05-26-2015

I've done a little checking. The problem with -mfpmath=387 appears for me using the gcc-4.9.2 mainline and gcc-5.1.0 mainline compilers but not with the svn cilkplus and cilkplus-4_8-branch. I'm compiling on Debian Wheezy, if that makes any difference.

Eric_O_ · ‎05-28-2015

I've created a simple test program to illustrate this bug

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <cilk/cilk.h>

const struct timespec onesec={1,0};

double f(double x){
    nanosleep(&onesec,0);
    return 2*x*sin(x);
}
double g(double x){
    nanosleep(&onesec,0);
    return x*cos(x);
}

int main(){
    double a,b;
    a=cilk_spawn f(1);
    b=cilk_spawn g(2);
    cilk_sync;
    printf("f(1)+g(2)=%g\n",a+b);
    return 0;
}

When I compile this program with gcc-5.1 mainline and the cilkplus-4_8-branch I obtain

$ /usr/local/gcc-5.1/bin/gcc -fcilkplus -Wall \
        -mfpmath=387 fpbug.c -o fpbug-gcc510-387 -lcilkrts -lm
$ ./fpbug-gcc510-387
Floating point exception

$ /usr/local/gcc-5.1/bin/gcc -fcilkplus -Wall \
        -mfpmath=sse fpbug.c -o fpbug-gcc510-sse -lcilkrts -lm
$ ./fpbug-gcc510-sse
f(1)+g(2)=0.850648

$ /usr/local/cilk-4.8/bin/gcc -fcilkplus -Wall \
        -mfpmath=387 fpbug.c -o fpbug-cilk48-387 -lcilkrts -lm
$ ./fpbug-cilk48-387
f(1)+g(2)=0.850648

$ /usr/local/cilk-4.8/bin/gcc -fcilkplus -Wall \
        -mfpmath=sse fpbug.c -o fpbug-cilk48-sse -lcilkrts -lm
$ ./fpbug-cilk48-sse
f(1)+g(2)=0.850648

Can anyone else reproduce this bug with gcc-5.1 mainline? What can I do to fix this?

Jim_S_Intel · ‎05-28-2015

Thanks for the reproducing example. I don't personally happen to have immediate access to that combination of compiler / OS that you specified, but hopefully someone who does can look into it.

Quick question: does the error occur even if the environment variable CILK_NWORKERS=1 is set, or is CILK_NWORKERS >=2 required?
If the program fails even when running on a single thread, then that suggests a different kind of error than if 2 or more threads are required.

Any chance you can reproduce the error in gdb or some other debugger and generate a stack trace or identify which basic block where the error is being tripped? Is the error tripped in the compute of f, compute of g, or in the sum after the cilk_sync?

Cheers,

Jim

Eric_O_ · ‎05-28-2015

The error seems to be caused by lack of fpu initialization in one of the worker threads. In particular, it does not happen with CILK_NWORKERS=1. The gdb backtrace looks like

$ CILK_NWORKERS=2 gdb ./fpbug-gcc510-387
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /x/jesa/ejolson/code/cilk/fpbug/fpbug-gcc510-387...done.
(gdb) run
Starting program: /x/jesa/ejolson/code/cilk/fpbug/fpbug-gcc510-387 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6bec700 (LWP 22698)]

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff6bec700 (LWP 22698)]
0x00000000004009be in g ()
(gdb) backtrace
#0  0x00000000004009be in g ()
#1  0x0000000000400f71 in _cilk_spn_1.4026 ()
#2  0x0000000000400a87 in main ()
(gdb)

which seems to support this. My impression is that this is a general problem with the gcc-5.1.0 Cilk implementation and not Debian, as I get the exact same behavior on CentOS and in CLE/SUSE. At the same time, I find it odd that nobody has reported this error before, as the same problem seems present in gcc-4.9.2 mainline.

Jim_S_Intel · ‎05-28-2015

Hm... there is code in the Cilk Plus runtime which is supposed to be restoring the floating-point control state after resuming the continuation (i.e., the code after the first spawn of f()). The compiler is also supposed to have generated code on a spawn to save that state. If that state is not saved or restored correctly somehow, that seems like it could be related to the error you are seeing.

I don't recall exactly which version of the runtime source you are building from, but in the runtime source revision 4345 on http://cilkplus.org, the functions "sysdep_save_fp_ctrl_state" and "restore_x86_fp_state" are the relevant functions that the compiler/runtime would call to save that state. I see that there is a "config/x86" version and a "config/generic" version; I'm not quite sure which one you might be building with?
One possibility is that somehow the "generic" version is getting built/called where the "x86" version was supposed to be called?

If the compiler is generating inline code for sysdep_save_fp_ctrl_state, then the function in the runtime might not be called at all. I also see some inline assembly in those functions which has some checks related to "sse"... Another possibility is that the wrong implementation of those functions might be there for those flags? I'm not an expert on this section of the code, so I don't know exactly what is supposed to happen on every platform, but it does seem like a possible place for a bug...

Hansang_B_Intel · ‎05-29-2015

I did further investigation based on Jim's thoughts and found the followings.

1. "gcc-5-branch" (5.x mainline) is building libcilkrts with "config/x86" (correct one)

2. The binary compiled with "gcc-5-branch" calls __cilkrts_save_fp_ctrl_state, whereas the binary compiled with "gcc-cilkplus" does not.

3. After dead-coding sysdep_save_fp_ctrl_state/restore_x86_fp_state for "gcc-5-branch", the program runs fine (Eric, could you please check if you can reproduce this?)

My impression is that those (save/restore) functions may need some modification for the flag if possible (Jim's last thought/suggestion).

Eric_O_ · ‎05-29-2015

Woohoo! That worked. I commented out the define RESTORE_X86_FP_STATE at the top of libcilkrts/runtime/config/x86/os-unix-sysdep.c to obtain the following

// On x86 processors (but not MIC processors), the compiler generated code to
// save the FP state (rounding mode and the like) before calling setjmp.  We
// will need to restore that state when we resume.
#ifndef __MIC__
# if defined(__i386__) || defined(__x86_64)
//#   define RESTORE_X86_FP_STATE
# endif // defined(__i386__) || defined(__x86_64)
#endif  // __MIC__

Recompiling the run-time library and installing yields a compiler that works for the test case as well as real code. This message is to confirm the fix. Are there any obvious side effects to this patch that I need to consider? If I don't explicitly fiddle with the FPU using the functions in fenv.h, is it reasonable to assume everything will be okay?

Eric_O_ · ‎05-29-2015

This message is to confirm that floating point now works on 32-bit Intel with this patch as well. Thanks for the help. Is there anything more that needs to be done to ensure appropriate patches make it into gcc and the run-time library?

Jim_S_Intel · ‎05-31-2015

I might be remembering the history incorrectly, but I think those particular save/restore functions were added relatively late in the development, in response to a bug in code that actually did care about using different floating-point rounding modes. So commenting out the restore function might break code that actually does change the floating point round mode? For example, if you had changed the rounding mode at the beginning of main, and then expected g to also use the changed rounding mode, then the restore function would change the rounding mode properly if g gets stolen and executed on a different worker. But that use case was probably rare and subtle enough that it took a while for anyone to notice it, since I think most users don't change the defaults. Perhaps that fix actually introduced a different bug, i.e., the one you are seeing now?

Anyway, I believe Hansang and others are taking a closer look at this issue. If you don't explicitly change the FPU modes, etc., I think that the workaround you have should be ok? But they will be able to provide more definitive answers than I can.
Cheers,

Jim

Barry_T_Intel · ‎06-01-2015

Jim's memory is correct. The Cilk Plus runtime saves the floating point state (including the floating point rounding mode) at each spawn and restores it on a steal using these functions. We didn't consider any floating point mode other than using the SSE instructions. I'm not sure how we could handle x87 mode.

- Barry (who implemented the change)

Eric_O_ · ‎08-14-2015

The Cilkplus floating-point exception bug is still present in gcc-5.2. In particular, gcc with Cilkplus extensions still gives a floating point exception on 32-bit Intel architectures and 64-bit Intel with -mfpmath=387. The fix of commenting out RESTORE_X86_FP_STATE at the top of libcilkrts/runtime/config/x86/os-unix-sysdep.c still solves this problem. Unless a better solution is found, I would suggest implementing this known fix in mainline.

Hansang_B_Intel · ‎08-17-2015

Eric O. wrote:

The Cilkplus floating-point exception bug is still present in gcc-5.2. In particular, gcc with Cilkplus extensions still gives a floating point exception on 32-bit Intel architectures and 64-bit Intel with -mfpmath=387. The fix of commenting out RESTORE_X86_FP_STATE at the top of libcilkrts/runtime/config/x86/os-unix-sysdep.c still solves this problem. Unless a better solution is found, I would suggest implementing this known fix in mainline.

Don't worry about this bug since there is a better fix for it. It was an incorrect use of instruction when saving FP state (fnstsw -> fnstcw).

gcc 5.1 -mfpmath=387 Floating point exception