incorrect results for dgeqrf / dorgqr

lytles · ‎08-07-2008

overview: we have a small matrix library. it uses mkl wrapped with c and called from java thru jni (but i've reproduced this problem with pure c, see below). i have it built for both x86.32 and x64 systems. i'm using dgeqrf and dorgqr to construct the qr decomposition

on a 32 bit pentium4, this works fine. on my 64 bit systems (both intel and amd) i'm getting incorrect results, ie norm (QR-A) is on the order of 1 (it should be near zero). this is only happening for matrices that are slightly wider than they are tall, but for those sizes, it happens most of the time with random-valued matrices. here's a graph of problem sizes - the first column is the number of rows, the rest are the number of columns at which the problem occurs

 98 102 103 104

 99 101 102 103 104

 100 101 102 103 104

 101 102 103 104

 102 103 104

 103 104

 106 107 108 109 110 111 112

 107 108 109 110 111 112

 108 109 110 111 112

 109 110 111 112

 110 111 112

 111 112

...

as the number of rows increase, the "triangles" get larger. in a few cases i've checked out the answer and compared with the "correct" QR (taken from acml or jlapack or 32 bit mkl). the error is in the lower right hand corner of R (in the portion that would be truncated for the "economy" representation, ie in the columns greater than the number of rows, as A = Q * [ R1 R2 ], R1 square, the problem is with R2)

i've reduced this to a simple test case in c that gives different results on 32 v 64 bit machines. can anyone verify that they've seen similar problems or tell what i'm doing wrong or test my code below.

here's the info on my systems and the simple test case in c that reproduces the error. the results will agree until the final four elements.

Intel Math Kernel Library 10.0 Update 3 for Linux*
32 bit: gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
64 bit: gcc (GCC) 4.2.3 (built by hand)

// test_dgeqrf.c

#include

#include

#include

#include

int main() {

 int rows = 100, cols = 101, nn = rows*cols;

 int vals[ nn ];

 for (int ii=0;ii int prime = 11713;
 for (int ii=10;ii double valo[nn], val2[nn];
 for (int ii=0,kk=0; ii for (int jj = 0; jj < cols; jj++,kk++) valo[ ii + jj*rows ] = 1.0*vals[ kk ]/prime;
 memcpy( val2, valo, nn*sizeof(double) );
 double tau[ rows ];
 int lw = rows * 64, info = 0;
 double work[ lw ];

 dgeqrf( &rows, &cols, valo, &rows, tau, work, &lw, &info );
 printf( "dgeqrf status: %d
", info );
 for (int ii = 0; ii < nn; ii++) printf( "%5d %8.3f %8.3f
", ii, 
val2[ii], valo[ii] );

}

# here's the compile that i'm using
# 32 bit
gcc -I/home/lytles/install/apps/mkl/include -I/usr/java/jdk1.5.0_08/include -I/usr/java/jdk1.5.0_08/include/linux -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -fopenmp -L/home/lytles/install/apps/mkl/lib/em64t -lmkl_gf_lp64 -liomp5 /home/lytles/install/apps/mkl/lib/em64t/libmkl_lapack.a /home/lytles/install/apps/mkl/lib/em64t/libmkl_gnu_thread.a /home/lytles/install/apps/mkl/lib/em64t/libmkl_core.a -ldl -lm
# 64 bit
gcc -I/home/lytles/install/apps/mkl/include -I/usr/lib/jvm/java-6-sun/include -I/usr/lib/jvm/java-6-sun/include/linux -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -L/home/lytles/install/apps/mkl/lib/32 /home/lytles/install/apps/mkl/lib/32/libmkl_gf.a /home/lytles/install/apps/mkl/lib/32/libmkl_lapack.a /home/lytles/install/apps/mkl/lib/32/libmkl_gnu_thread.a /home/lytles/install/apps/mkl/lib/32/libmkl_core.a /home/lytles/install/apps/mkl/lib/32/libiomp5.a -lm 


> diff t1.bryan.txt t1.proteus.txt 
10098,10101c10098,10101
< 10096 -1.436 -0.235
< 10097 -0.832 0.202
< 10098 -0.710 0.266
< 10099 -0.592 -0.097
---
> 10096 -1.436 0.328
> 10097 -0.832 -0.218
> 10098 -0.710 0.092
> 10099 -0.592 -0.115

note: the errors are on the order of 1
note: the values from the 32 bit system (bryan) satisfy A=QR, the 64 bit results do not
note: i've tried to format this post sanely, but it may be ugly - apologies in advance

thanks
seth

Todd_R_Intel · ‎08-07-2008

Seth,

Could you try removing libmkl_lapack.a from the link line for your 64-bit configuration? That library is actually a compatibility script with only the following contents:
GROUP (libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a)

This will of course bring in the wrong interface and wrong thread layer. There are some differences between the GNU and Intel compiler interfaces on 64-bit platforms, and that may be the cause of your problem.

While you're at it, you might remove libmkl_lapack.a from your 32-bit link line as well. Any reason why you link libmkl_gf.a on 32-bit, but use "-lmkl_gf_lp64" on the 64-bit platform (which would default I think to libmkl_gf_lp64.so)?

-Todd

lytles · ‎08-07-2008

ok - easy stuff first. the link lines that i gave were from the build for my java app. i'm using libmkl_gf.a on 32 bit, to simplify installation of the java app. i'd do the same thing on 64 bit but i get x86 relocation errors.

when i remove libmkl_lapack.a from the link line, i get:

/home/lytles/install/apps/mkl/lib/em64t/libmkl_core.a(_mc_mkl_dbsrsymv_lp64.o): In function `mkl_spblas_lp64_mc_mkl_dbsrsymv':

__tmp_mkl_dbsrsymv.c:(.text+0xa0): undefined reference to `mkl_spblas_lp64_dbsrmmsym'
...
[ snip - there are 100s of lines ]

here's the new link line ($mkl is the root of the install):

-lmkl_gf_lp64 -liomp5 $mkl/lib/em64t/libmkl_gnu_thread.a $mkl/lib/em64t/libmkl_core.a -ldl -lm

changing the link to ($mkll is $mkl/lib/em64t):

gcc -I $mkl/include -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -fopenmp -L$mkll -lmkl_gf_lp64 -liomp5 $mkll/libmkl_core.a $mkll/libmkl_gnu_thread.a $mkll/libmkl_core.a -lgfortran -ldl -lm

succeeds, but the answer from dgeqrf is still incorrect. switching to libmkl_sequential.a (ie removing libmkl_gnu_thread.a) seems to fix the problem.

Edit: realized something important. my 32 bit machine is an old pentium4 which mkl used to recognize as multicore. but after the last update, mkl treats it as single core. setting

export MKL_DYNAMIC=false

export OMP_NUM_THREADS=2

I can force mkl to run using openmp on 32 bits, and lol, i get the same error. so not 32 v 64 bit at all, but openmp v sequential.

lytles · ‎08-07-2008

To summarize:

dgeqrf is giving incorrect results for R when run using openmp

occurs on both 32 bit and 64 bit machines
only for matrices that are slightly wider than they are tall, eg 100 x 103
plotting the effected sizes results in the "triangular" structure noted above
link line is:

gcc  -I $mkl/include  -pthread -std=c99 test_dgeqrf.c

-lmkl_gf_lp64 -liomp5 -lmkl_gnu_thread -lmkl_core -lgfortran -ldl -lm

i've tried a bunch of variants of this, with and without -fopenmp and changing the order

whenever more than one thread is used, i get incorrect results

using -lmkl_sequential fixes the problem

export OMP_NUM_THREADS=1 # fixes the problem

the norm of the error is on the order of the magnitude of the elements, ie large

any confirmation, anti-confirmation, or suggestions on how i can change the link line much appreciated.

as a work-around, i'm thinking about using omp_set_num_threads()

lytles · ‎08-11-2008

Has anyone verified or anti-verified this issue ? This is a major issue for us - we bought mkl largely because of the threading capability.

TimP · ‎08-11-2008

Have you submitted your test case on premier.intel.com? According to the information you have provided, your case seems well off the beaten path. For example, even in my use of MKL with g++ and gfortran, I've never had the occasion to use gnu_threads or mkl_gf libraries.

Todd_R_Intel · ‎08-11-2008

I've brought this to the attention of some here, but haven't heard yet.

Submitting the problem to Premier support (http://premier.intel.com)as Tim suggests and highlighting it as a showstopper is a good idea as this is still the primary means for providing support for the product.

-Todd

lytles · ‎08-12-2008

i submitted it yesterday (see the table below - not sure if anyone else can see the ticket), but still a bit worrisome that it wasn't addressed in the public forum by intel. i'd even be happy to hear "you're wrong, it works, RTFM"

why do you say off the beaten path ? i'm using mkl with a supported compiler, and threading is one of the key features of mkl - here's the first paragraph from Intel's Overview page (emphasis mine):

Intel Math Kernel Library (Intel MKL) offers highly optimized, extensively threaded math routines for scientific, engineering, and financial applications that require maximum performance.

is there something unusual about what i'm doing ? have you done something similar with a different link line ?

incorrect results for dgeqrf using openmp
Issue Number	511590	Issue Status	Investigating
Originator	seth lytle	Submit Date	8/11/2008
Company	event monitor	Last Update	8/12/2008

Todd_R_Intel · ‎08-15-2008

It looks like you found a bug. If you had been doing something wrong then I suppose someone would have asked you to RTFM (or something to that effect).

For now, to work around this problem you will need to explicitly run serially in sections where these functions are used.

The engineer was able reproduce your problem based on your forum posts. But if in the future you're not getting the results or response you'd expect from Intel and the community you might choose premier support (our primary means of support the product) sooner. From the welcome:

While this is primarily a discussion forum, if a question is posted that appears to indicate a defect in the software or documentation, we will work with you to use our Intel Premier support site, which is in place to register and respond to bugs or other support issues you may have.

-Todd

lytles · ‎08-26-2008

ok

premier.intel.com got back to me. long term they have a fix that they'll incorporate into mkl (i hope for the next update). and short-term they came up with a work-around. increase the width of the matrix that's being passed in (and the corresponding # of columns). here's their note:

Dear Seth,


The issue occurs if matrix dimensions are in the following constraints: m<=(m/nb+1)*nb,
The exact way to do it is (example in C tested on posted reproducer):

MKL_INT nb, nbold, ispec, mione, nthr;
ispec = 1; mione = -1; nthr = omp_num_threads();
nb = ilaenv( &ispec, "DGEQRF", " ", &rows, &cols, &nthr, &mione );
do{
if( rowsnbold = nb;
nb = ilaenv( &ispec, "DGEQRF", " ", &rows, &cols, &nthr, &mione );
} while ( nb != nbold );


It is necessary to repeat because with the columns number change the NB value also could increase.


Thanks,

Vipin

leona_l_ · ‎07-01-2014

code java qr decomposition

Ying_H_Intel · ‎07-01-2014

Hi Leona,

Do you hope to see the example code of java qr decomposition?

You may first refer to example code of MKL java under MKL install directory. It include several example, like dgemm, dgesv, dgesvd. So you can modify them to get dgeqrf sample.

Best Regards,

Ying