Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

incorrect results for dgeqrf / dorgqr

lytles
Beginner
1,171 Views

overview: we have a small matrix library. it uses mkl wrapped with c and called from java thru jni (but i've reproduced this problem with pure c, see below). i have it built for both x86.32 and x64 systems. i'm using dgeqrf and dorgqr to construct the qr decomposition


on a 32 bit pentium4, this works fine. on my 64 bit systems (both intel and amd) i'm getting incorrect results, ie norm (QR-A) is on the order of 1 (it should be near zero). this is only happening for matrices that are slightly wider than they are tall, but for those sizes, it happens most of the time with random-valued matrices. here's a graph of problem sizes - the first column is the number of rows, the rest are the number of columns at which the problem occurs


 98 102 103 104
 99 101 102 103 104
 100 101 102 103 104
 101 102 103 104
 102 103 104
 103 104
 104 
 105 
 106 107 108 109 110 111 112
 107 108 109 110 111 112
 108 109 110 111 112
 109 110 111 112
 110 111 112
 111 112
 112 
...

as the number of rows increase, the "triangles" get larger. in a few cases i've checked out the answer and compared with the "correct" QR (taken from acml or jlapack or 32 bit mkl). the error is in the lower right hand corner of R (in the portion that would be truncated for the "economy" representation, ie in the columns greater than the number of rows, as A = Q * [ R1 R2 ], R1 square, the problem is with R2)

i've reduced this to a simple test case in c that gives different results on 32 v 64 bit machines. can anyone verify that they've seen similar problems or tell what i'm doing wrong or test my code below.

here's the info on my systems and the simple test case in c that reproduces the error. the results will agree until the final four elements.

Intel Math Kernel Library 10.0 Update 3 for Linux*
32 bit: gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
64 bit: gcc (GCC) 4.2.3 (built by hand)

// test_dgeqrf.c
#include 
#include 
#include 
#include 

int main() {
 int rows = 100, cols = 101, nn = rows*cols;
 int vals[ nn ];
 for (int ii=0;ii
 int prime = 11713;
 for (int ii=10;ii
 double valo[nn], val2[nn];
 for (int ii=0,kk=0; ii
 for (int jj = 0; jj < cols; jj++,kk++) valo[ ii + jj*rows ] = 1.0*vals[ kk ]/prime;
 memcpy( val2, valo, nn*sizeof(double) );
 double tau[ rows ];
 int lw = rows * 64, info = 0;
 double work[ lw ];

dgeqrf( &rows, &cols, valo, &rows, tau, work, &lw, &info );
 printf( "dgeqrf status: %d
", info );
 for (int ii = 0; ii < nn; ii++) printf( "%5d %8.3f %8.3f
", ii, 
val2[ii], valo[ii] );

}

# here's the compile that i'm using
# 32 bit
gcc -I/home/lytles/install/apps/mkl/include -I/usr/java/jdk1.5.0_08/include -I/usr/java/jdk1.5.0_08/include/linux -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -fopenmp -L/home/lytles/install/apps/mkl/lib/em64t -lmkl_gf_lp64 -liomp5 /home/lytles/install/apps/mkl/lib/em64t/libmkl_lapack.a /home/lytles/install/apps/mkl/lib/em64t/libmkl_gnu_thread.a /home/lytles/install/apps/mkl/lib/em64t/libmkl_core.a -ldl -lm
# 64 bit
gcc -I/home/lytles/install/apps/mkl/include -I/usr/lib/jvm/java-6-sun/include -I/usr/lib/jvm/java-6-sun/include/linux -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -L/home/lytles/install/apps/mkl/lib/32 /home/lytles/install/apps/mkl/lib/32/libmkl_gf.a /home/lytles/install/apps/mkl/lib/32/libmkl_lapack.a /home/lytles/install/apps/mkl/lib/32/libmkl_gnu_thread.a /home/lytles/install/apps/mkl/lib/32/libmkl_core.a /home/lytles/install/apps/mkl/lib/32/libiomp5.a -lm


> diff t1.bryan.txt t1.proteus.txt
10098,10101c10098,10101
< 10096 -1.436 -0.235
< 10097 -0.832 0.202
< 10098 -0.710 0.266
< 10099 -0.592 -0.097
---
> 10096 -1.436 0.328
> 10097 -0.832 -0.218
> 10098 -0.710 0.092
> 10099 -0.592 -0.115

note: the errors are on the order of 1
note: the values from the 32 bit system (bryan) satisfy A=QR, the 64 bit results do not
note: i've tried to format this post sanely, but it may be ugly - apologies in advance

thanks
seth





0 Kudos
11 Replies
Todd_R_Intel
Employee
1,171 Views
Seth,

Could you try removing libmkl_lapack.a from the link line for your 64-bit configuration? That library is actually a compatibility script with only the following contents:
GROUP (libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a)

This will of course bring in the wrong interface and wrong thread layer. There are some differences between the GNU and Intel compiler interfaces on 64-bit platforms, and that may be the cause of your problem.

While you're at it, you might remove libmkl_lapack.a from your 32-bit link line as well. Any reason why you link libmkl_gf.a on 32-bit, but use "-lmkl_gf_lp64" on the 64-bit platform (which would default I think to libmkl_gf_lp64.so)?

-Todd
0 Kudos
lytles
Beginner
1,171 Views

ok - easy stuff first. the link lines that i gave were from the build for my java app. i'm using libmkl_gf.a on 32 bit, to simplify installation of the java app. i'd do the same thing on 64 bit but i get x86 relocation errors.

when i remove libmkl_lapack.a from the link line, i get:
/home/lytles/install/apps/mkl/lib/em64t/libmkl_core.a(_mc_mkl_dbsrsymv_lp64.o): In function `mkl_spblas_lp64_mc_mkl_dbsrsymv':
__tmp_mkl_dbsrsymv.c:(.text+0xa0): undefined reference to `mkl_spblas_lp64_dbsrmmsym'
...
[ snip - there are 100s of lines ]

here's the new link line ($mkl is the root of the install):

-lmkl_gf_lp64 -liomp5 $mkl/lib/em64t/libmkl_gnu_thread.a $mkl/lib/em64t/libmkl_core.a -ldl -lm
changing the link to ($mkll is $mkl/lib/em64t):
gcc -I $mkl/include -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -fopenmp -L$mkll -lmkl_gf_lp64 -liomp5 $mkll/libmkl_core.a $mkll/libmkl_gnu_thread.a $mkll/libmkl_core.a -lgfortran -ldl -lm
succeeds, but the answer from dgeqrf is still incorrect. switching to libmkl_sequential.a (ie removing libmkl_gnu_thread.a) seems to fix the problem.

Edit: realized something important. my 32 bit machine is an old pentium4 which mkl used to recognize as multicore. but after the last update, mkl treats it as single core. setting
export MKL_DYNAMIC=false
export OMP_NUM_THREADS=2
I can force mkl to run using openmp on 32 bits, and lol, i get the same error. so not 32 v 64 bit at all, but openmp v sequential.






0 Kudos
lytles
Beginner
1,171 Views
To summarize:

dgeqrf is giving incorrect results for R when run using openmp
  • occurs on both 32 bit and 64 bit machines
  • only for matrices that are slightly wider than they are tall, eg 100 x 103
  • plotting the effected sizes results in the "triangular" structure noted above
  • link line is:
    • gcc  -I $mkl/include  -pthread -std=c99 test_dgeqrf.c 
      • -lmkl_gf_lp64 -liomp5 -lmkl_gnu_thread -lmkl_core -lgfortran -ldl -lm
    • i've tried a bunch of variants of this, with and without -fopenmp and changing the order
  • whenever more than one thread is used, i get incorrect results
    • using -lmkl_sequential fixes the problem
    • export OMP_NUM_THREADS=1 # fixes the problem
  • the norm of the error is on the order of the magnitude of the elements, ie large

any confirmation, anti-confirmation, or suggestions on how i can change the link line much appreciated.


as a work-around, i'm thinking about using omp_set_num_threads()



0 Kudos
lytles
Beginner
1,171 Views
Has anyone verified or anti-verified this issue ? This is a major issue for us - we bought mkl largely because of the threading capability.
0 Kudos
TimP
Honored Contributor III
1,171 Views
Have you submitted your test case on premier.intel.com? According to the information you have provided, your case seems well off the beaten path. For example, even in my use of MKL with g++ and gfortran, I've never had the occasion to use gnu_threads or mkl_gf libraries.
0 Kudos
Todd_R_Intel
Employee
1,171 Views

I've brought this to the attention of some here, but haven't heard yet.

Submitting the problem to Premier support (http://premier.intel.com)as Tim suggests and highlighting it as a showstopper is a good idea as this is still the primary means for providing support for the product.

-Todd

0 Kudos
lytles
Beginner
1,171 Views
i submitted it yesterday (see the table below - not sure if anyone else can see the ticket), but still a bit worrisome that it wasn't addressed in the public forum by intel. i'd even be happy to hear "you're wrong, it works, RTFM"

why do you say off the beaten path ? i'm using mkl with a supported compiler, and threading is one of the key features of mkl - here's the first paragraph from Intel's Overview page (emphasis mine):
Intel Math Kernel Library (Intel MKL) offers highly optimized, extensively threaded math routines for scientific, engineering, and financial applications that require maximum performance.
is there something unusual about what i'm doing ? have you done something similar with a different link line ?


incorrect results for dgeqrf using openmp
Issue Number 511590 Issue Status Investigating
Originator seth lytle Submit Date 8/11/2008
Company event monitor Last Update 8/12/2008








0 Kudos
Todd_R_Intel
Employee
1,171 Views
It looks like you found a bug. If you had been doing something wrong then I suppose someone would have asked you to RTFM (or something to that effect).

For now, to work around this problem you will need to explicitly run serially in sections where these functions are used.

The engineer was able reproduce your problem based on your forum posts. But if in the future you're not getting the results or response you'd expect from Intel and the community you might choose premier support (our primary means of support the product) sooner. From the welcome:
While this is primarily a discussion forum, if a question is posted that appears to indicate a defect in the software or documentation, we will work with you to use our Intel Premier support site, which is in place to register and respond to bugs or other support issues you may have.
-Todd
0 Kudos
lytles
Beginner
1,171 Views
ok

premier.intel.com got back to me. long term they have a fix that they'll incorporate into mkl (i hope for the next update). and short-term they came up with a work-around. increase the width of the matrix that's being passed in (and the corresponding # of columns). here's their note:

Dear Seth,

The issue occurs if matrix dimensions are in the following constraints: m<=(m/nb+1)*nb,
The exact way to do it is (example in C tested on posted reproducer):
MKL_INT nb, nbold, ispec, mione, nthr;
ispec = 1; mione = -1; nthr = omp_num_threads();
nb = ilaenv( &ispec, "DGEQRF", " ", &rows, &cols, &nthr, &mione );
do{
if( rowsnbold = nb;
nb = ilaenv( &ispec, "DGEQRF", " ", &rows, &cols, &nthr, &mione );
} while ( nb != nbold );

It is necessary to repeat because with the columns number change the NB value also could increase.

Thanks,
Vipin


0 Kudos
leona_l_
Beginner
1,171 Views
0 Kudos
Ying_H_Intel
Employee
1,171 Views

Hi Leona, 

Do you hope to see the example code of java qr decomposition? 

You may first refer to example code of MKL java under MKL install directory.  It include several example, like dgemm, dgesv, dgesvd. So you can modify them to get  dgeqrf  sample. 

Best Regards,

Ying

0 Kudos
Reply