I have a similar problem

JJK · ‎08-21-2014

hi all,

I'm porting some software to the Xeon Phi that's using gsl. I've downloaded gsl 1.16 and configured and compiled it using

  ./configure --host=x86_64-unknown-linux-gnu CC=icc CXX=icpc CFLAGS="-mmic -O2"

(using icc 14.0.3 20140422)

The code compiles OK but the test code coredumps on the Xeon Phi itself; there are multiple components of gsl that coredump, one of them is 'vector':

mic0> gdb ./test
GNU gdb (GDB) 7.5+mpss3.2.3
[...]

(gdb) r
Starting program: /home/janjust/src/gsl-1.16/vector/test

Program received signal SIGSEGV, Segmentation fault.
0x000000000040f026 in test_complex_func (stride=16, N=32) at test_complex_source.c:121
121            if (v->data[2*i*stride] != (ATOMIC) (i) || v->data[2*i*stride + 1] != (ATOMIC) (i + 1234))

The weird thing is that the function where it never crashes is never called using stride=16, N=32 so it seems the optimizer altered something.

If I remove the "-O2" then the code runs OK. The same code with CC=icc CFLAGS="-O2" runs fine on the host CPU (Xeon E5). Is this a compiler optimisation error? how do I 'downgrade' the compiler optimisation for a particular piece of code? How can I further troubleshoot this?

TIA,

JJK

Kevin_D_Intel · ‎08-27-2014

There should be no difference between "-mmic -O2" and just "-mmic" since -O2 is the default.

You can optimize at the routine level with a #pragma optimize documented here.

The symptoms of SegV on the coprocessor vs. success on the host CPU suggest a possible unaligned access on the coprocessor. I’ve seen an ASSERT used in other cases to help detect unaligned addresses but I don't know whether this works for structure members. It might help determine if one or more accesses of v are unaligned. I will inquire w/others. Maybe something like this:

V_addr1 = &(v->data[2*i*stride])
ASSERT(V_addr1 %64 ==0);

V_addr2 = &(v->data[2*i*stride + 1])
ASSERT(V_addr2 %64 ==0);

Kevin_D_Intel · ‎08-28-2014

Here is some guidance from Development regarding the details of your earlier post:

Normally, in plain C code without intrinsics the addresses are not required to be 64-byte aligned – only element-wise alignment is required (for example, pointer to ‘double’ must have 8-byte alignment). I believe the ‘data’ array is allocated using ‘malloc’ so it should be already aligned properly.

This segV might be due to a bug in vectorizer, so I suggest trying newer compiler (15.0) or disabling vectorization of the particular loop around the problematic line:

121            if (v->data[2*i*stride] != (ATOMIC) (i) || v->data[2*i*stride + 1] != (ATOMIC) (i + 1234))

To disable vectorization of the loop, #pragma novector can be used, as follows:

#pragma novector
for (i = 0; i < N; i++)
…

Please let me know whether this is still reproducible with 15.0 and if so whether I can get a reproducer to provide Development for further investigation and development of any associated fix.

JJK · ‎09-02-2014

I've upgraded mpss to 3.3 , installed icc v15.0 and reran my test - the coredumps are now gone, but there are some new failing tests (not coredumps, just wrong results). I will open a new ticket for them.

By adding a few '#pragma novector' lines and a few hacks to the gsl test scripts I am now able to run all gsl tests successfully on a Xeon Phi!

Valjean__Jean · ‎04-16-2020

I have a similar problem using gsl-2.6.

The gsl library is compiled corect for -mmic.I've tried few examples form the gsl documentation and they are working fine.

About test gsl - I tried from gsl folder an example called ltmain.sh (but the other one called test-gsl-histogram.sh I didn't managed, I think I must find the right parameters...)

The problem is similar: when I compile my project with -O2 or -O3 and run it, it gives straight away "segmentation fault".

When I compile with -O1 it is running well. Also the gdb under mic0 works fine and gives no errors.

Please be so kind and give me a bit more details about the hacks and where to write '#pragma novector.

I hope the last news (mpss3.8.6 and compiler) have the same solution for this problem.

thank you

JJK · ‎04-17-2020

gsl 2.6 is a bit different from the version I tied 6 yrs ago but I just managed to get it running on my old trusty MIC. I configured gsl 2.6 using

./configure --host=x86_64-unknown-linux-gnu CC=icc CXX=icpc CFLAGS=-mmic -fp-model precise --disable-shared

and then applied the attached patch. The, after compiling a 'make check' ran with all tests passing.

Valjean__Jean · ‎04-19-2020

Thank you for your answer, I've configured without "-fp-model precise" because it was not recognized by the "./configure". Maybe for this reason I passed around 13 tests, but not the "ieee". I will continue to investigate why -fp-model is not recognized.

Because a problem comes always with its "brothers" and never alone, I've splited my project in two portions, one with gsl to run in host, than save in results in txt files and second with intrinsics which reads these txt files and run in mic0. I found out that gsl was not the only reason I received "segmentation fault", So, I've done -report-phase=vec and added #pragma novector to every "for" loop where was written "loop was parallelized" except the hotspot intrinsics area. I've managed to make it works but the running time was growing from 7 minutes in a cpu avx2 with 6 core, to 65 minutes in mic0. My code is "omp + intrinsics native mic", no offload which means - no pci comunication to affect the time so much. And it is also carefully optimized to L1 bound in host.

Anyway, thank you for your time spent to help me, it looks that I must learn a new chapter about prefetching.

JJK · ‎04-20-2020

it seems some quotes were dropped in my post. The corrrect ./configure line is

./configure --host=x86_64-unknown-linux-gnu CC=icc CXX=icpc CFLAGS="-mmic -fp-model precise" --disable-shared

as the "-fp-model precise" thing is a C compiler flag, not a configure flag.

And yes, on the original Xeon Phis, prefetching and data ordering is the name of the game - it is possible to hit 1 TFlops Double Precision on it, but is very hard.

HTH,

JJK

gsl library compiled using -mmic -O2 coredupms