Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

[Solved] Segfault when using a dynamic library compiled with -axMIC-AVX512 on a non KNL machine

Corey_P_
Beginner
1,563 Views

Hi,

I am developing a dynamic library that is compiled using icc V17, that will eventually be designed to run on either KNL or other Intel architectures.

I am building the dynamic library with the -axMIC-AVX512 flag. The library I am building (see source code) contains a single function - "test_method". After it is built the symbol table shows me that internally there are a number of variants, which I assume to be used depending on whether or not it is running on AVX512 compatible hardware:

    16: 0000000000000810    64 FUNC    GLOBAL DEFAULT   12 test_method
    47: 0000000000000850   304 FUNC    LOCAL  DEFAULT   12 test_method.Z
    48: 0000000000000980   192 FUNC    LOCAL  DEFAULT   12 test_method.A
    62: 0000000000000810    64 FUNC    GLOBAL DEFAULT   12 test_method

The code which calls test_method runs fine on KNL hardware, but on some non KNL machines the code segfaults. A debugger shows that even upon entering the function, a garbage function parameter is used, which is worrying. As you can see in the source code the parameter was c=9. However the debugger shows this (c=404743 ??), right before failing:

#0  0x00007f1e37d90792 in test_method (c=404743) at avxdynamiclib.c:4

If I modify the function so that when compiled it is not optimised / split according to the symbol table then everything works fine.

Any wisdom would be very appreciated.

 

The library code, along with the compilation command is:

// File: avxdynamiclib.c
// icc -axMIC-AVX512 -g -fPIC -O2 -o avxdynamiclib.o -c avxdynamiclib.c; readelf -a avxdynamiclib.o | grep 'test_method'; icc -m64 -shared -fPIC -static-libgcc avxdynamiclib.o -o libavxtest.so
int test_method(int c) {
        int i = 0, rc = 0;
        for (i = 0; i < c; i++) {
                rc += c + sizeof(int);
        }
        return rc;
}

The driver code, along with the compilation and execution command is:

// File: libtest.c
// > icc -axMIC-AVX512 -g -fPIC -O2 -L./ -lavxtest libtest.c -o libtest
// > LD_LIBRARY_PATH=$PWD:$LD_LIBRARY_PATH ./libtest
int main(int argc, char *argv[]) {
        test_method(9);
        return 0;
}

 

0 Kudos
7 Replies
SergeyKostrov
Valued Contributor II
1,563 Views
>>...int test_method(char a, void *b, int c)... 1. There is No any reference to b, to an array of 2048 elements of type char, in test_method function. >> char *mem = (char *) malloc (2048); >> test_method(9, mem, 2251); 2. A CRT function malloc allocates 2048 elements of type char and then the function test_method processes 2251 iterations but b is Not used. It would be a classic out-of-bound processing error if b is used. >>...The code which calls test_method runs fine on KNL hardware, but on some non KNL machines the code segfaults. 3. Since b is Not initialized it has garbage. This is a simple implementation problem and there is No any relation to KNL, or another, architectures.
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,563 Views

Sergey,

b is referenced in the fprintf statement (pointer value is printed).

Corey,

On optimized code the arguments (~4) are generally passed via registers. Stack space was reserved by the caller, filled in in Debug build, but not in Release mode. Symbolic debugging of registerized arguments can be erroneous when not stepping in to the function via the debugger.

I seem to recall the IPO had some issues with arguments such as char, short, etc... Maybe this is one of those cases. The temporary solution was to exclude those files from IPO.

Jim Dempsey

0 Kudos
SergeyKostrov
Valued Contributor II
1,563 Views
>>...b is referenced in the fprintf statement (pointer value is printed). It is Not used in processing and I suspect it should be.
0 Kudos
Corey_P_
Beginner
1,563 Views

> It is Not used in processing and I suspect it should be.

Sergey, the function shown is not used practically, it it just a very cut down version of an actual function to illustrate the error.

 

 

0 Kudos
Corey_P_
Beginner
1,563 Views

> I seem to recall the IPO had some issues with arguments such as char, short, etc... Maybe this is one of those cases. The temporary solution was to exclude those files from IPO.

Hi Jim. Thanks for the response. I did try switching all function arguments to int's and the segfault still occurred. Just to remove all ambiguity, I modified the source in the thread topic to contain an even more cut down version of the source.

0 Kudos
SergeyKostrov
Valued Contributor II
1,563 Views
>>>> It is Not used in processing and I suspect it should be. >> >>Sergey, the function shown is not used practically, it it just a very cut down version of an actual function to illustrate the error. That version does Not help to understand what is wrong and a more detailed version is needed.
0 Kudos
Corey_P_
Beginner
1,563 Views

The version demonstrated the issue perfectly well; it leads to a re-producible segfault with minimal code.

I finally tracked down the issue. It turns out there was a version issue with the runtime libraries that the library was linking at execution. With that fixed the code runs fine without a segfault.

 

Thanks for your help folks!

0 Kudos
Reply