Community
cancel
Showing results for 
Search instead for 
Did you mean: 
New Contributor I
133 Views

Two different FPE with Data Fitting library

Hi there,

I've encountered two different FPE with the MKL Data Fitting library which have been driving me a little insane!

I previously mentioned the first FPE in another thread here (https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/840833), and at the time I couldn't get a small reproducer to demonstrate the problem. This FPE occurs within the construction of the Akima spline when there are multiple, repeated y-values provided for a spline. It can be consistently repeated using the attached reproducer and compile/link lines.

The second FPE occurs within the construction of the Natural spline and seems to only occur on certain machines within our cluster. I'll provide specific kernel release and CPU info for one of the machines below, but there are multiple machines that repeat the FPE. This specific FPE seems to be temporarily resolved by enabled the MKL CNR compatibility mode. Again, the attached reproducer will consistently produce this FPE on specific machines.

I compile using GCC 7.3 and use MKL 2019u5. I'm compiling on a CentOS 7 box, and running on other CentOS 7 machines.

After sourcing the Intel PSXE script 'psxevars.sh', I compile and link the reproducer using the following commands:

g++ -m64 -I${MKLROOT}/include -c CubicSpline_MKL.cc
g++ -m64 -I${MKLROOT}/include -c test_MKLSpline.cc
g++ -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl CubicSpline_MKL.o test_MKLSpline.o -o test_MKLSpline

If there are any further questions, or you want some more specific info then I'm happy to help. It has taken me a while to get this reproducer, so I'm very keen to try and resolve these issues :)

Thanks,

Ewan


Details of a machine that exhibits the second FPE with the attached testcase:

  •     uname -a:
    • Linux haggis41 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  •     summarised /cat/proc/cpuinfo
    •         processor       : 23
    •         vendor_id       : GenuineIntel
    •         cpu family      : 6
    •         model           : 85
    •         model name      : Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz
    •         stepping        : 4
    •         microcode       : 0x200005e
    •         cpu MHz         : 2301.000
    •         cache size      : 16896 KB
    •         physical id     : 1
    •         siblings        : 12
    •         core id         : 13
    •         cpu cores       : 12
    •         apicid          : 58
    •         initial apicid  : 58
    •         fpu             : yes
    •         fpu_exception   : yes
    •         cpuid level     : 22
    •         wp              : yes
    •         flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_pt spec_ctrl ibpb_support tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
    •         bogomips        : 4606.57
    •         clflush size    : 64
    •         cache_alignment : 64
    •         address sizes   : 46 bits physical, 48 bits virtual
    •         power management:
0 Kudos
7 Replies
Moderator
133 Views

thanks for a reproducer, Ewan. We will investigate the case.

0 Kudos
Moderator
133 Views

Edward, We reproduced both of these issues and will keep this thread updated with the progress.

0 Kudos
New Contributor I
133 Views

I'm not sure who Edward is, but I'm glad the reproducer works on your systems too.

Ewan

0 Kudos
New Contributor I
133 Views

Has there been any update on this issue?

0 Kudos
Moderator
133 Views

Ewan, there is no update on this issue so far. Do you have the deadline, when do you expect to see this fix?

0 Kudos
New Contributor I
133 Views

On my end, I've missed the boat for our upcoming release. However I'm hoping I can get a fix for these issues within our next cycle.

That would mean ideally a tested fix by June/July. Is this a reasonable expectation?

0 Kudos
Moderator
133 Views

Yes, we are planning to do that but these topics are subject to change and we couldn't guarantee that the fix at the next update 2. We will keep you informed of the progress we make.  

Gennady

 

0 Kudos