When i use intel mkl library, i met a strange problem.
I built liabaray A (it would call cblas_sgemm function), using libmkl_intel_thread.so.
The routine to get the wrong result:
I built application B which depend on library A, if in the makefile I included libmkl_intel_thread.so as dependency, I ran the application B, it could not get the right result.
The routine to get the wrong right:
I built application B which depend on library A, if in the makefile I included libmkl_sequential.so as dependency. While if i built application B, it could get the right result. But in this way, i could not get the benefit from multi-thread matrix computing.
Can you please elaborate how you compare the two results, and how you draw conclusions that one result is incorrect?
That said, it is expected for threaded execution and sequential execution to give different results. This is because floating-point computations are not commutative nor associative. That is, (a+b)+c does not equal to a+(b+c). So in a threaded execution, computations may be grouped in different threads and complete in different orders, which result in different results. However, even when two results are different, they should be treated as correct results, as long as the difference is below certain level of tolerance that you define, for example n*1.0E-6 for single precision computations, where n is the matrix size. There are more advanced techniques to test if two results are numerically equivalent, depending on factors such as the numerical characteristics of the input matrices, your application requirements, etc.