Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

Memory / data race problems in DGETRI

andysim
Beginner
1,246 Views

I have a problem that was originally detected within a large software package, where a reasonably large (2224x2224) matrix inversion is catastrophically wrong with multiple threads.  To eliminate the influence of other errors in that code, I wrote a simple inversion code that reads the offending input (from binary) and calls the pertinent LAPACK routines; in this standalone code the answer with multiple threads is correct, but Intel Inspector shows a number of memory and data race errors.  The same errors in Inspector can be found when inverting an identity matrix, so it is easy to reproduce with the following code:

 

#include <cassert>
#include <cmath>
#include <cstdlib>
#include <fstream>
#include <stdio.h>
#include <vector>

// Fortran function declarations
extern "C" {
    extern int dgetrf_(int*, int*, double*, int*, int*, int*);
    extern int dgetri_(int*, double*, int*, int*, double*, int*, int*);
}

// C BLAS/LAPACK wrappers
int C_DGETRF(int m, int n, double* a, int lda, int* ipiv) {
    int info;
    dgetrf_(&m, &n, a, &lda, ipiv, &info);
    return info;
}

int C_DGETRI(int n, double* a, int lda, int* ipiv, double* work, int lwork) {
    int info;
    dgetri_(&n, a, &lda, ipiv, work, &lwork, &info);
    return info;
}

std::vector<double> invert(const std::vector<double> &matrix, int dim) {
    int lwork = dim * dim;
    std::vector<double> work(lwork);
    std::vector<int> ipiv(dim);
    std::vector<double> inverse(matrix);

    int err = C_DGETRF(dim, dim, inverse.data(), dim, ipiv.data());

    if (err != 0) {
        if (err < 0) {
            printf("invert: C_DGETRF: argument %d has invalid parameter.\n", -err);
            abort();
        }

        if (err > 1) {
            printf(
                "invert: C_DGETRF: the (%d,%d) element of the factor U or L is "
                "zero, and the inverse could not be computed.\n",
                err, err);
            abort();
        }
    }

    err = C_DGETRI(dim, inverse.data(), dim, ipiv.data(), work.data(), lwork);
    if (err != 0) {
        if (err < 0) {
            printf("invert: C_DGETRI: argument %d has invalid parameter.\n", -err);
            abort();
        }

        if (err > 1) {
            printf(
                "invert: C_DGETRI: the (%d,%d) element of the factor U or L is "
                "zero, and the inverse could not be computed.\n",
                err, err);
            abort();
        }
    }
    return inverse;
}

int main(int argc, char* argv[]) {
    const double TOL=1e-14;

    if (argc != 2) {
        printf("\nUsage:\n\n\t%s matrix_dim\n\n", argv[0]);
        exit(1);
    }
    const int N = std::atoi(argv[1]);
    printf("N = %d\n", N);

    std::vector<double> matrix(N*N, 0.0);
    for (int row = 0; row < N; ++row) {
        matrix[row * N + row] = 1.0;
    }

    auto inverse = invert(matrix, N);

    for (int row = 0; row < N; ++row) {
        assert( std::abs(inverse[row * N + row] - 1.0) < TOL );
    }
    return 0;
}

 

If I link this code against the Netlib LAPACK  implementation, it runs through Inspector without any of the memory / data race errors.  So if the memory and data race errors that I see for this simple case are genuine problems, rather than false alerts, it is possibly the cause of the problem in the larger production code; can anybody on this forum offer me any advice about how to determine if this is a genuine problem inside DGETRI?  I have been running the minimal code with a 2224x2224 identity matrix, but the same errors can be observed with much smaller matrices.  The above code is set up to allow linking against LAPACK libraries with the original Fortran conventions, but I see similar problems if I try the MKL C interfaces instead.

 

The number of memory errors is constant with respect to OMP_NUM_THREADS and MKL_NUM_THREADS, but the number of data races observed does change depending on whether each of those variables are defined or not.  I did not see any changes with setting/unsetting MKL_DISABLE_FAST_MM.

 

The problem has been reproduced with a number of MKL versions and Linux platforms, including

OneAPI 2021.2.0 20210228, on Linux kernel 3.10.0-1160.36.2.el7.x86_64

(compiled by calling icpc -mkl minimal_example.cc -std=c++14 -g)

 

0 Kudos
1 Solution
Khang_N_Intel
Employee
205 Views

Hi Andrew,


After a thorough analysis on the issue, the team conclude that this is false-positive reported by the inspector because there is no failure resulted from this issue. This is a known limitation of inspector:

https://www.intel.com/content/www/us/en/developer/articles/troubleshooting/false-positive-diagnostic....


You can also read more at:

https://www.intel.com/content/www/us/en/develop/documentation/intel-inspector-2018-update-4-release-...

 

  • Intel Inspector may report false positives for analyzed applications using customized synchronization primitives. Use of 
  • _itt_notify to annotate your source code can reduce these false positives.
  • We have decided to NOT annotate our source code, because of the potential performance impact, given there is no demostrated FAILURE, other than an Inspector report.


Unless you experience an error in he output, otherwise, we conclude this is not an issue.


There will be no more communication on this thread. Please do not hesitate to open new threads should you have new questions/issues.


Best,

Khang


View solution in original post

9 Replies
ArpitaP_Intel
Moderator
1,214 Views

Hi Andrew,

 

Thanks for reaching out to us.

 

>>Intel Inspector shows a number of memory and data race errors.

 

Could you please provide us with the exact errors that Intel Inspector shows

 

Also, please help us with the exact steps you followed to run your reproducer code and check it with Intel Inspector.

 

It would be better if you also let us know the number of threads you used for executing this reproducer.

 

Lastly, could you also try executing your code in the latest version of MKL?

 

Thanks!

 

 

 

andysim
Beginner
1,191 Views

Thank you very much for the rapid response.  I have upgraded OneAPI and am using the "classic" icpc version 2021.3.0 20210609 for the latest tests.  If I compile as stated above, I run Inspector in the following way (the actual choice of number of threads is arbitrary, and the problem can be reproduced with any choice).

> export OMP_NUM_THREADS=8

> export MKL_NUM_THREADS=8

> inspxe-cl -collect=mi3 -r report_mi3_omp8_mkl8_dim2224 -- ./a.out 2224                                                                                                   
N = 2224
  
8 new problem(s) found 
    4 Invalid memory access problem(s) detected 
    1 Missing allocation problem(s) detected 
    2 Uninitialized memory access problem(s) detected 
    1 Uninitialized partial memory access problem(s) detected 

> inspxe-cl -collect=ti3 -r report_ti3_omp8_mkl8_dim2224 -- ./a.out 2224                                                                                                   
N = 2224
Warning: One or more threads in the application accessed the stack of another thread. This may indicate one or more bugs in your application. Setting the Inspector to detect data races on stack accesses and running another analysis may help you locate these and other bugs.
  
1 new problem(s) found 
    1 Data race problem(s) detected 

 

I have only pasted the summaries here to avoid spamming the forum, but I am happy to provide the detailed logs if you would like to see them.  If I inspect the detailed report, I see

libmkl_avx2.so.1!mkl_blas_avx2_xdscal - libmkl_avx2.so.1:0x52cde1

at the top of the stack for one of the invalid memory access errors.  To ensure this is not an AVX-specific issue, I ran the same tests on an older pre-AVX CPU and found the following:

> export OMP_NUM_THREADS=8                                                                                                                                                 

> export MKL_NUM_THREADS=8                                                                                                                                                 

> inspxe-cl -collect=mi3 -r report_mi3_omp8_mkl8_dim2224_noavx -- ./a.out 2224
N = 2224
  
5 new problem(s) found 
    1 Missing allocation problem(s) detected 
    2 Uninitialized memory access problem(s) detected 
    2 Uninitialized partial memory access problem(s) detected 

> inspxe-cl -collect=ti3 -r report_ti3_omp8_mkl8_dim2224_noavx -- ./a.out 2224                                                                                             
N = 2224
Warning: One or more threads in the application accessed the stack of another thread. This may indicate one or more bugs in your application. Setting the Inspector to detect data races on stack accesses and running another analysis may help you locate these and other bugs.
  
2 new problem(s) found 
    2 Data race problem(s) detected 

 

While this indeed cleaned up some of the invalid memory accesses, it did not fix all of the problems.  Please let me know if I can provide any more information.

ArpitaP_Intel
Moderator
1,126 Views

Hi Andrew,


Thanks for your response.


>>If I link this code against the Netlib LAPACK implementation, it runs through Inspector without any of the memory / data race errors.


Please let us know the steps you followed to link the reproducer against Netlib LAPACK implementation.


Thanks!


andysim
Beginner
1,106 Views

After downloading the Netlib code from the link in my original post, I made a build subdirectory in the untarred top level LAPACK directory.  To build the library from that newly created "/path/to/build_dir" location, I ran

CC=icc CXX=icpc FC=ifort cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j 16

To then use that library in the reproducer code I ran

icpc minimal_example.cc  -std=c++14  -L/path/to/build_dir/lib -llapack -lblas /path/to/oneapi/compiler/linux/compiler/lib/intel64/libifcore.so  -g

 Running inspector showed no issues:

> inspxe-cl -collect=mi3 -r report_mi3_netlib_dim2224 -- ./a.out 2224                                                                                                   
N = 2224
  
0 new problem(s) found 

Please let me know if you need any more details. 

ArpitaP_Intel
Moderator
1,058 Views

Hi Andrew,


Thanks for the information.

We are working on this internally. We will get back to you soon .


Thanks!


Khang_N_Intel
Employee
1,006 Views

Hi Andrew,


I was able to reproduce the issues you mentioned using Intel Inspector. I tested your code in Ubuntu 20.04.3 LTS with oneMKL 2021.3.

Inspector showed the exact 8 problems as yours.

As for data race issue, inspector found up to 41 problems.


Anyway, I am going to escalate this issue to the developers.


We will let you know when this issue get fixed.


Best regards,

Khang


Khang_N_Intel
Employee
939 Views

Hi Andrew,


I just want to let you know that our engineers are working hard on this issue.


They discovered some issues on the function DGETRF. They are doing a deeper analysis to root-cause the issue.


I will let you know when they finish the analysis and when they will provide the fixes.


Best regards,

Khang


Khang_N_Intel
Employee
786 Views

Hi Andrew,


The engineer was able identify the issues and is currently working on the fix for them.

I will let you know when they are available.


Best regards,

Khang



Khang_N_Intel
Employee
206 Views

Hi Andrew,


After a thorough analysis on the issue, the team conclude that this is false-positive reported by the inspector because there is no failure resulted from this issue. This is a known limitation of inspector:

https://www.intel.com/content/www/us/en/developer/articles/troubleshooting/false-positive-diagnostic....


You can also read more at:

https://www.intel.com/content/www/us/en/develop/documentation/intel-inspector-2018-update-4-release-...

 

  • Intel Inspector may report false positives for analyzed applications using customized synchronization primitives. Use of 
  • _itt_notify to annotate your source code can reduce these false positives.
  • We have decided to NOT annotate our source code, because of the potential performance impact, given there is no demostrated FAILURE, other than an Inspector report.


Unless you experience an error in he output, otherwise, we conclude this is not an issue.


There will be no more communication on this thread. Please do not hesitate to open new threads should you have new questions/issues.


Best,

Khang


Reply