Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Multi-thread MKL cblas_sgemm with g++ problem

Keren_Z_
Beginner
900 Views

Here's an example of sgemm program.

#include <mkl.h>
#include <iostream>
#include <cstdlib>
#define ITERATION 1

int main()
{
  int ra = 128;
  int lda = 75;
  int ldb = 55;
  float* left = (float*)calloc(ra * lda, sizeof(float));
  float* right = (float*)calloc(ldb * lda, sizeof(float));
  float* ans = (float*)calloc(ra * ldb, sizeof(float));
  std::cout << "left " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < lda; ++j) {
      left[i * lda + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << left[i * lda + j] << " ";
    }
    std::cout << std::endl;
  }

  std::cout << "right " << std::endl;
  for (int i = 0; i < lda; ++i) {
    for (int j = 0; j < ldb; ++j) {
      right[i * ldb + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << right[i * ldb + j] << " ";
    }
    std::cout << std::endl;
  }

  for (int i = 0; i < ITERATION; ++i) {
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, ra, ldb, lda, 1.0f, left, lda,
      right, ldb, 0.0f, ans, ldb);
  }

  std::cout << "ans " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < ldb; ++j) {
      std::cout << ans[i * ldb + j] << " ";
    }
    std::cout << std::endl;
  }

  return 0;
}

I compile this program with g++ by options `-fopenmp -lmkl_rt`, where `OMP_NUM_THREADS` has been set to 16. 

After running the program, I figure out that the answer is exactly wrong comparing to the matlab result. I wouldn't say wrong if there's only few accuracy errors. Further, I observe that the program performs well under these conditions:

  1. Use icc instead of g++,
  2. Remove -fopenmp flag,
  3. Use g++&atlas instead of icc&mkl
  4. Set OMP_NUM_THREADS=1

Therefore, I guess the problem may lay on the `-fopenmp` flag. Can you help me figure out the problem? Thank you!

g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)

icc (ICC) 16.0.3 20160415

Linux core 2.6.32-279.el6.x86_64

0 Kudos
3 Replies
Ying_H_Intel
Employee
900 Views

Hi Keren,

Thanks for raise the question here.    the problem looks be here.

mkl_rt is Intel MKL Single Dynamic Library (SDL).    As mkl user guide explained : the  SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking
with SDL provides:
• Intel LP64 interface on systems based on the Intel® 64 architecture
• Intel interface on systems based on the IA-32 architecture
• Intel threading
To use other interfaces or change threading preferences, including use of the sequential version of Intel MKL,
you need to specify your choices using functions or environment variables as explained in section
Dynamically Selecting the Interface and Threading Layer.

So if you compiler it with GNU G++ and GNU openmp threading,  Could you please try to  export :

export MKL_INTERFACE_LAYER = GNU

MKL_THREADING_LAYER = GNU

then run your exe and see if it can get  expected result.

Best Regards,
Ying

Other related On-line article about

https://software.intel.com/en-us/articles/a-new-linking-model-single-dynamic-library-mkl_rt-since-intel-mkl-103/

0 Kudos
Keren_Z_
Beginner
900 Views

I tried to export MKL_INTERFACE_LAYER=GNU, but it does not help.

Alternatively, I choose to link `-lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -lpthread -ldl` instead of the single dynamic library, and it turns out that problems are solve by doing this.

0 Kudos
Ying_H_Intel
Employee
900 Views

Hi Keren, 

Glad you try them out. 

Just add comments so other may understand 

Export MKL_INTERFACE_LAYER = GNU +  export MKL_THREADING_LAYER = GNU

-lmkl_intel_lp64  -lmkl_gnu_thread -lmkl_core -lpthread -ldl

Best Regards,

Ying

0 Kudos
Reply