Community
cancel
Showing results for 
Search instead for 
Did you mean: 
101 Views

My daemon fell, probaly because bug in intel compiler.

Hello there, 

I met interesting bug, may be it's not bug.

I have daemon and I have to use OpenMP for parallel and ipp. 

My daemon fell. I wrote minimal app which repeat this bug. There is bug only if I link ippcore. 

[cpp]

#include <fstream>
#include <stdlib.h>
#include <signal.h>
#include "stdio.h"

using namespace std;

int main(int argc, char *argv[]) {
    
    
    printf("DAEMONIZING...\n");
    
    // Демонизация - создание дочернего процесса.     pid_t pid = ::fork();
    
    if (pid < 0) {
        // Не удалось создать дочерний процесс - выходим.
        printf("First ::fork failed.\n");
        // Логировать нельзя, почему - описано выше.
        ::exit(EXIT_FAILURE);
    }

    
    if (pid > 0) {
        // Выход из родительского процесса при успешном создании дочернего.
        ::exit(EXIT_SUCCESS);
    }
    
    pid_t sid = ::setsid(); // Теперь лидер группы процессов.
    
    if (sid < 0) {
        // Не удалось - выходим.
        printf("First ::setsid failed.\n");
        // Логировать нельзя, почему - описано выше.
        ::exit(EXIT_FAILURE);
    }
    
    //     // Инициализация демонизации.
    for (int n = 0; n < 64; n++) {
        ::close(n); // Закрываем все дескрипторы.
    }
    
    #pragma omp critical
    {
        ofstream fileof("./output.txt");
        fileof << "opaopa" << endl; 
    }
}

[/cpp]

"icc 
source.cpp​ -openmp -lippcore"

I have icc version 13.1.2 (gcc version 4.6.0 compatibility). OpenSuse 12.1.

I try to detect this problem using valgrind. I got this message 

Process terminating with default action of signal 11 (SIGSEGV)
==24572==  Access not within mapped region at address 0x8
==24572==    at 0x5348EBB: __intel_ssse3_rep_memcpy (in /opt/intel/composer_xe_2013.4.183/compiler/lib/intel64/libiomp5.so)
==24572==    by 0x533C955: _intel_fast_memcpy.P (in /opt/intel/composer_xe_2013.4.183/compiler/lib/intel64/libiomp5.so)
==24572==    by 0x52FAED5: __kmp_user_lock_allocate (kmp_lock.cpp:3025)
==24572==    by 0x52E143F: __kmpc_critical (kmp_csupport.c:852)
==24572==    by 0x400DD8: main (in /home/komarov.a.s/exampleForIntel/a.out)

I founded decide to fixed this bug. Need to first use "omp critical" before daemonizing. And then omp critical after daemonizing will be work.

Also i ran this program in "intel inspector" add flag "Analyze stack accesses". I got that there is error in libippcore.so.7.1 in function ippGetCpuFeatures, source libippcore.so.7.1:0x3f03 "uninitialized memory access".

That's all. I hope somebody to help me.

 

0 Kudos
18 Replies
Bernard
Black Belt
101 Views

>>>==24572==    at 0x5348EBB: __intel_ssse3_rep_memcpy >>>

Seems that __intel_ssse3_memcpy() caused segfault.Can you provide more information related to register context and full call stack?

101 Views

Thanks for the answer.

I created core dump for my app. 

Is that enough?

Bernard
Black Belt
101 Views

Hi

Tomorrow I will look at that file.

Bernard
Black Belt
101 Views

@Alexander

Can you extract the callstack and register context from the file which you uploaded and upload it as a text file?I am using Windows and cannot open your file.

101 Views

I extracted everything you need.

Bernard
Black Belt
101 Views

@Alexander

Thank you.

jimdempseyatthecove
Black Belt
101 Views

#pragma omp critical is intended to be issued from within a parallel region. Though it should be benign when issued outside a parallel region. As depicted in your sketch code, main is intended to be a process entry point. Though I suppose you could, after starting the process, and entering a parallel region, you could  then issue a function call to main (recursively as well as rentrantly).

The omp critical sections are not inter-process critical sections. It is unclear as to why you would have an omp critical section in the main line code. Is this an omission in you sketch code?

Jim Dempsey

101 Views

Thanks for the answer Jim. 

I just made minimal program which repeat this error and in fact I forgot to add "#pragma omp parallel" before "#pragma omp critical". But this error repeat in both cases. So there is not large difference. But more properly use "omp critical" in "omp parallel".

Thanks for the correction.

Bernard
Black Belt
101 Views

I suppose that this register could contain the culprit of segfault "RSI 0x8". As I do not know the implementation of  __intel_ssse3_rep_memcpy I cannot be sure if rsi register contains source address of the memory buffer which is about to be copied by aforementioned function.


 
Bernard
Black Belt
101 Views

Btw if you could somehow provide the call stack with the called functions arguments.

jimdempseyatthecove
Black Belt
101 Views

Have you tested to see if the ctor for ofstream fileof("./output.txt"); opened the file?

Jim Dempsey

101 Views

jimdempseyatthecove wrote:

Have you tested to see if the ctor for ofstream fileof("./output.txt"); opened the file?

Jim Dempsey

Yes, I have tested to see. But my program doesn't create this file. If I delete line  "#pragma omp critical", so file will be created.

101 Views

Jim, I have tested to see this file. But my program doesn't create this file. If I delete line "#pragma omp critical", so file will be created.

@iliyapolak

What do you mean? How I understand, I gave you all the information, which I could get. May be, I'm not right.

I think It would be interesting to hear Intel's thoughts about this problem. I don't know how to reach them. 

Do you know Intel has bug report?

Bernard
Black Belt
101 Views

Hi Alexander

You are right,but one crucial information is absent those are functions parameters and I do not know which command you must issue to GDB in order to collect those parameters.At this stage it is very hard to blame Intel compiler for the segfault,although Intel library is probably generating access violation,but I said it earlier aguments which are passed to __intel_ssse3_rep_memcpy() are not displayed by the debugger thus rendering further investigation quite impossible.

>>>I think It would be interesting to hear Intel's thoughts about this problem. I don't know how to reach them>>>

For now letting IDZ community and Intel engineers to know about your problem is probably sufficient.Meanwhile if you have possibility to test your code on different computer it is recommended to do it.

 

101 Views

Hi, @iliyapolak

I understand you. I will wait. 

Seruly, I have tested my program on different computers. So error repeats.

I don't understand how connected OpenMP and IPP in my program. How I understand it's two different part.

Bernard
Black Belt
101 Views

When running your code on different machines do You see the same error?

101 Views

iliyapolak wrote:

When running your code on different machines do You see the same error?

Yes, I see the same error.

Bernard
Black Belt
101 Views

I hope that Intel devs will investigate that issue.

Reply