Community
cancel
Showing results for 
Search instead for 
Did you mean: 
125 Views

cblas_daxpy gives wrong results with multuple threads

Hello,

I have installed the most recent version of MKL and tried to use in my application built using VS 2015 C++. Certain parts of my code use OMP. However, I never call MKL routines from within an OMP block. My problem is that unless I disable OMP, or run on 1 thread, I keep getting wrong results. This was not the case in the past when  I was using an older (about 10 yrs ago) version of MKL together with VS2008.

 I wonder if there is any solution that allows my VS 2015 C++ application to keep OMP and run MKL routines.

Thank you!

Hussein

 

 

0 Kudos
13 Replies
Gennady_F_Intel
Moderator
125 Views

then you may check if cblas_daxpy return the same results in both cases when you use 1 thread and in the case of many threads and let us know if the results are different. as an additional control - set the MKL_VERBOSE to see some run time details.

125 Views

Thank you for your reply. I am able to demonstrate/isolate the problem in the samll peice of C code below, where I have two vectors of 1's. I get the sum of the two vectors by simply adding and by cblas_daxpy and compare by printing the first 3 entries. I am using VS2015.

Building with /openmp, number of threads = 2, nn = 1000000, I get the following wrong result from cblas_daxpy:
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 3.20GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000283036FC38,000001F00F4C2080,1,000001F00ED1D080,1) 3.53ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:2
Blas:   3       3       3

If I build without /openmp, OR if I set the number of threads = 1, OR if I set nn <=4095, I get the correct results from cblas_daxpy:
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 3.20GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000E620EFFC78,000001E8DFC20080,1,000001E8DF475080,1) 2.27ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:1
Blas:   2       2       2

#include <stdio.h>
#include "mkl.h"

int main(void)
{
 int i;
 int nn = 1000000;
 int alignment = 64;
 double * temp1;
 double * temp2;
 double * temp3;
 double alpha = 1.0;
 int inc = 1;

 mkl_set_dynamic(0);
 mkl_set_num_threads(2);
 mkl_verbose(1);

 temp1 = mkl_malloc(sizeof(double)*nn, alignment);
 temp2 = mkl_malloc(sizeof(double)*nn, alignment);
 temp3 = mkl_malloc(sizeof(double)*nn, alignment);

 for (i = 0; i < nn; i++) {
  temp1 = 1;  temp2 = 1;
  temp3 = temp1 + temp2;
 }
 printf("Temp1:\t");  for (i = 0; i < 3; i++) { printf("%g\t", temp1); } printf("\n");
 printf("Temp2:\t"); for (i = 0; i < 3; i++) {printf("%g\t", temp2);} printf("\n");
 printf("Sum:\t"); for (i = 0; i < 3; i++) {printf("%g\t", temp3);} printf("\n");

 // temp1 = temp1 + temp2
 cblas_daxpy(nn, alpha, temp2, 1, temp1, inc);
 printf("Blas:\t"); for (i = 0; i < 3; i++) { printf("%g\t", temp1); } printf("\n");

 return 0;

}

 

 

 

Gennady_F_Intel
Moderator
125 Views

thanks for the case. That looks very strange and we will check how it works on our side.

Gennady_F_Intel
Moderator
125 Views

I don't see the problem on our side.

comipling the example:    icl /Qopenmp /Qmkl test_axpy.cpp /Fe1.exe

>1.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000B019AFF8F8,00000207DC835080,1,00000207DC08D080,1) 3.26ms CNR:SSE4_2 Dyn:1 FastMM:1 TID:0  NThr:1
Blas:   2       2       2

 

>set MKL_NUM_THREADS=2

>1.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,00000072A9AFFD48,0000026431DCE080,1,000002643161A080,1) 3.48ms CNR:SSE4_2 Dyn:1 FastMM:1 TID:0  NThr:2
Blas:   2       2       2
 

125 Views

Thank you for the quick response.

I guess you are using Intel compiler? Will you be able to try Visual C++? That is what I am using here, and the problem is probably there.

My suspicion is that openmp cannot be combined with MKL on windows when using compilers other than Intel (!?) I think I read a comment somewhere about that but I can't recall not sure.

Gennady_F_Intel
Moderator
125 Views

ok, I will try to use VS 2015 compiler available. Please show how do you exactly build your code with mkl. OpenMP should be compiled with MKL when Intel or Microsoft compilers are used ( in theory :)).

125 Views

Create a new project TestMKL (win32 console application, check empty project, uncheck Security Development Lifecycle (SDL))
copy test.c code (provided earlier) into the directory TestMKL
create TestMKL\lib and copy there 4 libraries: libiomp5md.lib, mkl_core.lib, mkl_intel_lp64.lib, mkl_intel_thread.lib
copy MKL's include folder (that has mkl.h etc) into TestMKL
In visiual studio, right click Source Files and add existing item test.c to the project
RightClick on the project TestMKL and select Properties
--Select VC++ directories and then Include Directories. Add TestMKL\include there
--Select Linker->Input and go to Additional Dependencies. Add the above 4 libs sitting under the libs subfolder
--Select C/C++ --> all options. Set Open MP Support to Yes
Now it is time to build. Select Release, x64 and build the solution
Copy libiomp5md.dll to TestMKL\x64\Release to be able to run
When you run, you will see the wrong numbers

If you go back to properties and set Open MP Support to No, you will see correct numbers.

Gennady_F_Intel
Moderator
125 Views

building and running the code from the command line.

Microsoft compiler is used. >cl /version
   Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64

linkin with:
..\Intel2019\compilers_and_libraries\windows\mkl\lib\intel64\mkl_intel_lp64.lib
..\Intel2019\compilers_and_libraries\windows\mkl\lib\intel64\mkl_intel_thread.lib
..\Intel2019\compilers_and_libraries\windows\mkl\lib\intel64\mkl_core.lib
..\Intel2019\compilers_and_libraries\windows\compiler\lib\intel64\libiomp5md.lib

 

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000B48B31F7E8,0000024E9C11E080,1,0000024E9B96B080,1) 3.97ms CNR:SSE4_2 Dyn:1 FastMM:1 TID:0  NThr:1
Blas:   2       2       2

>set MKL_NUM_THREADS=2

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,00000095B539FE68,0000025621D34080,1,0000025621586080,1) 3.93ms CNR:SSE4_2 Dyn:1 FastMM:1 TID:0  NThr:2
Blas:   2       2       2

 

125 Views

Thank you for the quick responses.

In the command line, did you have the /openmp option? If not, can you please try with this option included.

It would be great if you can send me the exact command line that you used.

Thanks again!

Gennady_F_Intel
Moderator
125 Views

sure, I built with /Qopenmp option like as follows

cl /Qopenmp /I"..\mkl\include"  test_daxpy.cpp  /Fe_64thr.exe  .\mkl_intel_lp64.lib ..\mkl_intel_thread.lib ..\mkl_core.lib ..\libiomp5md.lib
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:_64thr.exe
 

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000EA754FFCA8,00000260A43EB080,1,00000260A3C34080,1) 3.44ms CNR:SSE4_2 Dyn:1 FastMM:1 TID:0  NThr:1
Blas:   2       2       2

and with 2 threads

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,00000069D38FFCA8,000001ABAE569080,1,000001ABADDB8080,1) 3.68ms CNR:SSE4_2 Dyn:1 FastMM:1 TID:0  NThr:2
Blas:   2       2       2

 

125 Views

Thanks again for continuing to help.

In the case of VS2015, the option actually is /openmp not /Qopenmp. In your work, the option /Qopenmp you added was not recognized and was hence ignored. cl gives a message saying so. So, both your builds were without OpenMP. If you do /openmp rather than /Qopenmp, you will see the incorrect result. Please see below. So,  the ability of building with both MKL and OpenMP remains in question in the case of VS 2015.

----------------------------------

E:\TestMKL_2>cl /Qopenmp /I".\include" test.c /Fe_64thr.exe .\lib\libiomp5md.lib .\lib\mkl_core.lib .\lib\mkl_intel_lp64.lib .\lib\mkl_intel_thread.lib
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : Command line warning D9002 : ignoring unknown option '/Qopenmp'
test.c
Microsoft (R) Incremental Linker Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:_64thr.exe
test.obj
.\lib\libiomp5md.lib
.\lib\mkl_core.lib
.\lib\mkl_intel_lp64.lib
.\lib\mkl_intel_thread.lib

E:\TestMKL_2>_64thr
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 3.20GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000085859FC68,000001D3400CB080,1,000001D33F911080,1) 1.87ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:2
Blas:   2       2       2

E:\TestMKL_2>cl /openmp /I".\include" test.c /Fe_64thr.exe .\lib\libiomp5md.lib .\lib\mkl_core.lib .\lib\mkl_intel_lp64.lib .\lib\mkl_intel_thread.lib
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

test.c
Microsoft (R) Incremental Linker Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:_64thr.exe
test.obj
.\lib\libiomp5md.lib
.\lib\mkl_core.lib
.\lib\mkl_intel_lp64.lib
.\lib\mkl_intel_thread.lib

E:\TestMKL_2>_64thr
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 3.20GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000574191FCC8,0000017D78E9C080,1,0000017D786ED080,1) 3.49ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:2
Blas:   3       3       3

 

 

 

 

 

 

 

 

Gennady_F_Intel
Moderator
125 Views

Ok, fixing the issue with /openmp and running the code on SSE4_2 code path with 1, 2 and 4 treads, I still don't reproduce the problem you reported:

cl /openmp  test_daxpy.cpp  /Fe_64thr.exe  mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib

set MKL_NUM_THREADS=1

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,0000002EB15AF9B8,0000017F4A012080,1,0000017F49867080,1) 5.93ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:1
Blas:   2       2       2

2)  set MKL_NUM_THREADS=2

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,0000007E411BF968,000001CA0D0C9080,1,000001CA0C912080,1) 7.69ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:2
Blas:   2       2       2

set MKL_NUM_THREADS=3

>_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000BBC6F6F818,0000021CAF304080,1,0000021CAEB5D080,1) 6.98ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:3
Blas:   2       2       2

set MKL_NUM_THREADS=4

_64thr.exe
Temp1:  1       1       1
Temp2:  1       1       1
Sum:    2       2       2
MKL_VERBOSE Intel(R) MKL 2019.0 Update 4 Product build 20190411 for Intel(R) 64 architecture Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DAXPY(1000000,000000D22099F7E8,0000025DBFDF1080,1,0000025DBF63E080,1) 4.08ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:4
Blas:   2       2       2

125 Views

Thank you for continuing to help.

This is really puzzling as I am not able to see the same on my side. I am not sure if the problem is in the build or the run environment.

I have left my code (source, includes, libs as well as executable) here:

www.firmsofttech.com/MKL/TestMKL_2.zip

If you have the time and are still offering to help, please download and run to see if the problem happens on your side. We could also do the opposite. If you can kindly make your build code available to me, I can do the same here. This could tell us where the problem might be?

Thanks a lot!

Reply