Multi-thread MKL cblas_sgemm with g++ problem

Keren_Z_ · ‎06-29-2016

Here's an example of sgemm program.

#include <mkl.h>
#include <iostream>
#include <cstdlib>
#define ITERATION 1

int main()
{
  int ra = 128;
  int lda = 75;
  int ldb = 55;
  float* left = (float*)calloc(ra * lda, sizeof(float));
  float* right = (float*)calloc(ldb * lda, sizeof(float));
  float* ans = (float*)calloc(ra * ldb, sizeof(float));
  std::cout << "left " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < lda; ++j) {
      left[i * lda + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << left[i * lda + j] << " ";
    }
    std::cout << std::endl;
  }

  std::cout << "right " << std::endl;
  for (int i = 0; i < lda; ++i) {
    for (int j = 0; j < ldb; ++j) {
      right[i * ldb + j] = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
      std::cout << right[i * ldb + j] << " ";
    }
    std::cout << std::endl;
  }

  for (int i = 0; i < ITERATION; ++i) {
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, ra, ldb, lda, 1.0f, left, lda,
      right, ldb, 0.0f, ans, ldb);
  }

  std::cout << "ans " << std::endl;
  for (int i = 0; i < ra; ++i) {
    for (int j = 0; j < ldb; ++j) {
      std::cout << ans[i * ldb + j] << " ";
    }
    std::cout << std::endl;
  }

  return 0;
}

I compile this program with g++ by options `-fopenmp -lmkl_rt`, where `OMP_NUM_THREADS` has been set to 16.

After running the program, I figure out that the answer is exactly wrong comparing to the matlab result. I wouldn't say wrong if there's only few accuracy errors. Further, I observe that the program performs well under these conditions:

Use icc instead of g++,
Remove -fopenmp flag,
Use g++&atlas instead of icc&mkl
Set OMP_NUM_THREADS=1

Therefore, I guess the problem may lay on the `-fopenmp` flag. Can you help me figure out the problem? Thank you!

g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16)

icc (ICC) 16.0.3 20160415

Linux core 2.6.32-279.el6.x86_64

Ying_H_Intel · ‎07-04-2016

Hi Keren,

Thanks for raise the question here. the problem looks be here.

mkl_rt is Intel MKL Single Dynamic Library (SDL). As mkl user guide explained : the SDL enables you to select the interface and threading library for Intel MKL at run time. By default, linking
with SDL provides:
• Intel LP64 interface on systems based on the Intel® 64 architecture
• Intel interface on systems based on the IA-32 architecture
• Intel threading
To use other interfaces or change threading preferences, including use of the sequential version of Intel MKL,
you need to specify your choices using functions or environment variables as explained in section
Dynamically Selecting the Interface and Threading Layer.

So if you compiler it with GNU G++ and GNU openmp threading, Could you please try to export :

export MKL_INTERFACE_LAYER = GNU

MKL_THREADING_LAYER = GNU

then run your exe and see if it can get expected result.

Best Regards,
Ying