Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Oleg_M_Intel
Employee
167 Views

PCM V2.6 on Xeon E5 2667 v2, Fedora 20: floating point exception

Jump to solution

Hi, I am running PCM on my workstation with I7 3930K, Scientific Linux (clone of RHEL) just fine. However, for my server board with Xeon E5 2667 v2 with Fedora20, I have an exception:

root@node02 IntelPerformanceCounterMonitorV2.6# ./pcm.x 1

 Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43)

 Copyright (c) 2009-2013 Intel Corporation

Floating point exception
root@node02 IntelPerformanceCounterMonitorV2.6# 

Please advice. Thanks. 

0 Kudos
1 Solution
Emre_Eraltan
Novice
167 Views

Hi Roman,

Thanks for the fix that I put as a patch file (in attachment).

I confirm that it works now on my quad socket iVy Bridge platform with HT disabled.

Regards,

Emre

View solution in original post

11 Replies
Bernard
Black Belt
167 Views

Do you know the exact type of exception? I mean x87 type or SIMD type?

Can you run pcm under GDB? It should probably catch the exception and show the IP of faulting code.

Emre_Eraltan
Novice
167 Views

Hi,

I got the same issue today on a quad socket ivy bridge (E7-8891 v2 @ 3.20GHz) running with RHEL 6.5. I dont think it is related to the OS.

pcm.x[19170] trap divide error ip:40975c sp:7fffdf6431b0 error:0 in pcm.x[400000+30000]

here is the end of strace logs:

read(3, " 2\nsiblings\t: 20\ncore id\t\t: 4\ncp"..., 1024) = 1024
read(3, " yes\nfpu_exception\t: yes\ncpuid l"..., 1024) = 1024
read(3, "e mce cx8 apic sep mtrr pge mca "..., 1024) = 1024
read(3, "t tm pbe syscall nx pdpe1gb rdts"..., 1024) = 1024
read(3, "ology nonstop_tsc aperfmperf pni"..., 1024) = 1024
read(3, "e3 cx16 xtpr pdcm pcid dca sse4_"..., 1024) = 1024
read(3, "x f16c rdrand lahf_lm ida arat x"..., 1024) = 1024
read(3, "pid fsgsbase smep erms\nbogomips\t"..., 1024) = 1024
read(3, "ss sizes\t: 46 bits physical, 48 "..., 1024) = 1024
read(3, "r_id\t: GenuineIntel\ncpu family\t:"..., 1024) = 1024
read(3, "91 v2 @ 3.20GHz\nstepping\t: 7\ncpu"..., 1024) = 1024
read(3, "\nsiblings\t: 20\ncore id\t\t: 7\ncpu "..., 1024) = 1024
read(3, "es\nfpu_exception\t: yes\ncpuid lev"..., 1024) = 1024
read(3, "mce cx8 apic sep mtrr pge mca cm"..., 1024) = 1024
read(3, "tm pbe syscall nx pdpe1gb rdtscp"..., 1024) = 1024
read(3, "ogy nonstop_tsc aperfmperf pni p"..., 1024) = 1024
read(3, " cx16 xtpr pdcm pcid dca sse4_1 "..., 1024) = 1024
read(3, " f16c rdrand lahf_lm ida arat xs"..., 1024) = 1024
read(3, "pid fsgsbase smep erms\nbogomips\t"..., 1024) = 1024
read(3, "ess sizes\t: 46 bits physical, 48"..., 1024) = 1024
read(3, "dor_id\t: GenuineIntel\ncpu family"..., 1024) = 1024
read(3, "-8891 v2 @ 3.20GHz\nstepping\t: 7\n"..., 1024) = 1024
read(3, "\t: 2\nsiblings\t: 20\ncore id\t\t: 11"..., 1024) = 1024
read(3, "u\t\t: yes\nfpu_exception\t: yes\ncpu"..., 1024) = 1024
read(3, "sr pae mce cx8 apic sep mtrr pge"..., 1024) = 1024
read(3, "2 ss ht tm pbe syscall nx pdpe1g"..., 1024) = 1024
read(3, "od xtopology nonstop_tsc aperfmp"..., 1024) = 1024
read(3, " tm2 ssse3 cx16 xtpr pdcm pcid d"..., 1024) = 330
read(3, "", 1024)                       = 0
close(3)                                = 0
munmap(0x7f5745645000, 4096)            = 0
--- SIGFPE (Floating point exception) @ 0 (0) ---
+++ killed by SIGFPE (core dumped) +++
Floating point exception (core dumped)

I will try to debug but I would appreciate if anyone has information on this issue which I have not seen on Sandy Bridge processors.

Thanks,

Emre

 

Bernard
Black Belt
167 Views

Are you sure that floating point exception is thrown by PCM code?

hilgeman
Beginner
167 Views

I am having the same issue with E5-2667 v2 processors on RHEL6.5. Strangely enough, other processors like the E5-2680 v2 and E5-2697 v2 are fine.

I attached a debugger and got the following trace:

(gdb) r /bin/sleep 2
Starting program: /home/dell-guest/src/IntelPerformanceCounterMonitorV2.6/pcm-power.x /bin/sleep 2
[Thread debugging using libthread_db enabled]


 Intel(r) Performance Counter Monitor V2.6 (2013-11-04 13:43:31 +0100 ID=db05e43)

 Power Monitoring Utility
 Copyright (c) 2011-2012 Intel Corporation

Program received signal SIGFPE, Arithmetic exception.
0x000000000040826c in PCM::PCM (this=0x623080) at cpucounters.cpp:785
785         std::cout << "Number of physical cores: " << (num_cores/threads_per_core) << std::endl;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64 libgcc-4.4.7-3.el6.x86_64 libstdc++-4.4.7-3.el6.x86_64
(gdb)

The problem is a divide by zero caused by threads_per_core. We are not using HT, so threads_per_core=1. When I make that change, it works fine.

regards,

-Martin

Bernard
Black Belt
167 Views

I also started to suspect divison instruction  now you have confirmed it.

Emre_Eraltan
Novice
167 Views

I believe Martin found the issue.

Indeed, I had also disabled HT on my Dell platform. It seems that PCM is initializing threads_per_core to 0

PCM::PCM() :
    UnsupportedMessage("Error: unsupported processor. Only Intel(R) processors are supported (Atom(R) and microarchitecture codename Nehalem, Westmere, Sandy Bridge and Ivy Bridge)."),
    cpu_family(-1),
    cpu_model(-1),
    original_cpu_model(-1),
    threads_per_core(0),
    ...

and cannot get the information properly from /proc/cpuinfo ie. ++threads_per_core is never called.

Initializing threads_per_core to 1 is fixing the issue but this is just a workaround... Another workaround is to enable HT.

Thanks,
Emre

Roman_D_Intel
Employee
167 Views

thanks for reporting this.

could you please share your  /proc/cpuinfo file (for example attach to your post reply) to let us fix this properly.

Thank you

Roman

Emre_Eraltan
Novice
167 Views

Hi Roman,

You can find the cpuinfo atached.

Regards,

Emre

Roman_D_Intel
Employee
167 Views

thanks for the data.

Could you try this fix (put it above the line throwing the exception - cpucounters.cpp:785):

    if(threads_per_core == 0)
    {
        for (int i = 0; i < num_cores; ++i)
        {
            if(topology.socket == topology[0].socket && topology.core_id == topology[0].core_id)
                ++threads_per_core;
        }
    }

thanks,

Roman

Emre_Eraltan
Novice
168 Views

Hi Roman,

Thanks for the fix that I put as a patch file (in attachment).

I confirm that it works now on my quad socket iVy Bridge platform with HT disabled.

Regards,

Emre

View solution in original post

Roman_D_Intel
Employee
167 Views

Emre, thanks a lot for testing

Reply