- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
overview: we have a small matrix library. it uses mkl wrapped with c and called from java thru jni (but i've reproduced this problem with pure c, see below). i have it built for both x86.32 and x64 systems. i'm using dgeqrf and dorgqr to construct the qr decomposition
on a 32 bit pentium4, this works fine. on my 64 bit systems (both intel and amd) i'm getting incorrect results, ie norm (QR-A) is on the order of 1 (it should be near zero). this is only happening for matrices that are slightly wider than they are tall, but for those sizes, it happens most of the time with random-valued matrices. here's a graph of problem sizes - the first column is the number of rows, the rest are the number of columns at which the problem occurs
98 102 103 104
99 101 102 103 104
100 101 102 103 104
101 102 103 104
102 103 104
103 104
104
105
106 107 108 109 110 111 112
107 108 109 110 111 112
108 109 110 111 112
109 110 111 112
110 111 112
111 112
112
...
as the number of rows increase, the "triangles" get larger. in a few cases i've checked out the answer and compared with the "correct" QR (taken from acml or jlapack or 32 bit mkl). the error is in the lower right hand corner of R (in the portion that would be truncated for the "economy" representation, ie in the columns greater than the number of rows, as A = Q * [ R1 R2 ], R1 square, the problem is with R2)
i've reduced this to a simple test case in c that gives different results on 32 v 64 bit machines. can anyone verify that they've seen similar problems or tell what i'm doing wrong or test my code below.
here's the info on my systems and the simple test case in c that reproduces the error. the results will agree until the final four elements.
Intel Math Kernel Library 10.0 Update 3 for Linux*
32 bit: gcc (GCC) 4.2.3 (Ubuntu 4.2.3-2ubuntu7)
64 bit: gcc (GCC) 4.2.3 (built by hand)
// test_dgeqrf.c
#include
#include
#include
#include
int main() {
int rows = 100, cols = 101, nn = rows*cols;
int vals[ nn ];
for (int ii=0;ii int prime = 11713;for (int ii=10;ii double valo[nn], val2[nn];for (int ii=0,kk=0; ii for (int jj = 0; jj < cols; jj++,kk++) valo[ ii + jj*rows ] = 1.0*vals[ kk ]/prime;memcpy( val2, valo, nn*sizeof(double) );double tau[ rows ];int lw = rows * 64, info = 0;double work[ lw ];
dgeqrf( &rows, &cols, valo, &rows, tau, work, &lw, &info );printf( "dgeqrf status: %d ", info );for (int ii = 0; ii < nn; ii++) printf( "%5d %8.3f %8.3f ", ii, val2[ii], valo[ii] );
}# here's the compile that i'm using# 32 bit
gcc -I/home/lytles/install/apps/mkl/include -I/usr/java/jdk1.5.0_08/include -I/usr/java/jdk1.5.0_08/include/linux -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -fopenmp -L/home/lytles/install/apps/mkl/lib/em64t -lmkl_gf_lp64 -liomp5 /home/lytles/install/apps/mkl/lib/em64t/libmkl_lapack.a /home/lytles/install/apps/mkl/lib/em64t/libmkl_gnu_thread.a /home/lytles/install/apps/mkl/lib/em64t/libmkl_core.a -ldl -lm
# 64 bit
gcc -I/home/lytles/install/apps/mkl/include -I/usr/lib/jvm/java-6-sun/include -I/usr/lib/jvm/java-6-sun/include/linux -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -L/home/lytles/install/apps/mkl/lib/32 /home/lytles/install/apps/mkl/lib/32/libmkl_gf.a /home/lytles/install/apps/mkl/lib/32/libmkl_lapack.a /home/lytles/install/apps/mkl/lib/32/libmkl_gnu_thread.a /home/lytles/install/apps/mkl/lib/32/libmkl_core.a /home/lytles/install/apps/mkl/lib/32/libiomp5.a -lm
> diff t1.bryan.txt t1.proteus.txt
10098,10101c10098,10101
< 10096 -1.436 -0.235
< 10097 -0.832 0.202
< 10098 -0.710 0.266
< 10099 -0.592 -0.097
---
> 10096 -1.436 0.328
> 10097 -0.832 -0.218
> 10098 -0.710 0.092
> 10099 -0.592 -0.115
note: the errors are on the order of 1
note: the values from the 32 bit system (bryan) satisfy A=QR, the 64 bit results do not
note: i've tried to format this post sanely, but it may be ugly - apologies in advance
thanks
seth
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you try removing libmkl_lapack.a from the link line for your 64-bit configuration? That library is actually a compatibility script with only the following contents:
GROUP (libmkl_intel_lp64.a libmkl_intel_thread.a libmkl_core.a)
This will of course bring in the wrong interface and wrong thread layer. There are some differences between the GNU and Intel compiler interfaces on 64-bit platforms, and that may be the cause of your problem.
While you're at it, you might remove libmkl_lapack.a from your 32-bit link line as well. Any reason why you link libmkl_gf.a on 32-bit, but use "-lmkl_gf_lp64" on the 64-bit platform (which would default I think to libmkl_gf_lp64.so)?
-Todd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok - easy stuff first. the link lines that i gave were from the build for my java app. i'm using libmkl_gf.a on 32 bit, to simplify installation of the java app. i'd do the same thing on 64 bit but i get x86 relocation errors.
/home/lytles/install/apps/mkl/lib/em64t/libmkl_core.a(_mc_mkl_dbsrsymv_lp64.o): In function `mkl_spblas_lp64_mc_mkl_dbsrsymv':
__tmp_mkl_dbsrsymv.c:(.text+0xa0): undefined reference to `mkl_spblas_lp64_dbsrmmsym'
...
[ snip - there are 100s of lines ]
here's the new link line ($mkl is the root of the install):
-lmkl_gf_lp64 -liomp5 $mkl/lib/em64t/libmkl_gnu_thread.a $mkl/lib/em64t/libmkl_core.a -ldl -lmchanging the link to ($mkll is $mkl/lib/em64t):
gcc -I $mkl/include -fPIC -pthread -std=c99 -o test_dgeqrf.x test_dgeqrf.c -fopenmp -L$mkll -lmkl_gf_lp64 -liomp5 $mkll/libmkl_core.a $mkll/libmkl_gnu_thread.a $mkll/libmkl_core.a -lgfortran -ldl -lmsucceeds, but the answer from dgeqrf is still incorrect. switching to libmkl_sequential.a (ie removing libmkl_gnu_thread.a) seems to fix the problem.
Edit: realized something important. my 32 bit machine is an old pentium4 which mkl used to recognize as multicore. but after the last update, mkl treats it as single core. setting
export MKL_DYNAMIC=false
export OMP_NUM_THREADS=2I can force mkl to run using openmp on 32 bits, and lol, i get the same error. so not 32 v 64 bit at all, but openmp v sequential.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dgeqrf is giving incorrect results for R when run using openmp
- occurs on both 32 bit and 64 bit machines
- only for matrices that are slightly wider than they are tall, eg 100 x 103
- plotting the effected sizes results in the "triangular" structure noted above
- link line is:
gcc -I $mkl/include -pthread -std=c99 test_dgeqrf.c
-lmkl_gf_lp64 -liomp5 -lmkl_gnu_thread -lmkl_core -lgfortran -ldl -lm
- i've tried a bunch of variants of this, with and without -fopenmp and changing the order
- whenever more than one thread is used, i get incorrect results
- using -lmkl_sequential fixes the problem
export OMP_NUM_THREADS=1 # fixes the problem
- the norm of the error is on the order of the magnitude of the elements, ie large
any confirmation, anti-confirmation, or suggestions on how i can change the link line much appreciated.
as a work-around, i'm thinking about using omp_set_num_threads()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've brought this to the attention of some here, but haven't heard yet.
Submitting the problem to Premier support (http://premier.intel.com)as Tim suggests and highlighting it as a showstopper is a good idea as this is still the primary means for providing support for the product.
-Todd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
why do you say off the beaten path ? i'm using mkl with a supported compiler, and threading is one of the key features of mkl - here's the first paragraph from Intel's Overview page (emphasis mine):
Intel Math Kernel Library (Intel MKL) offers highly optimized, extensively threaded math routines for scientific, engineering, and financial applications that require maximum performance.is there something unusual about what i'm doing ? have you done something similar with a different link line ?
incorrect results for dgeqrf using openmp | |||
Issue Number | 511590 | Issue Status | Investigating |
Originator | seth lytle | Submit Date | 8/11/2008 |
Company | event monitor | Last Update | 8/12/2008 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For now, to work around this problem you will need to explicitly run serially in sections where these functions are used.
The engineer was able reproduce your problem based on your forum posts. But if in the future you're not getting the results or response you'd expect from Intel and the community you might choose premier support (our primary means of support the product) sooner. From the welcome:
While this is primarily a discussion forum, if a question is posted that appears to indicate a defect in the software or documentation, we will work with you to use our Intel Premier support site, which is in place to register and respond to bugs or other support issues you may have.-Todd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
premier.intel.com got back to me. long term they have a fix that they'll incorporate into mkl (i hope for the next update). and short-term they came up with a work-around. increase the width of the matrix that's being passed in (and the corresponding # of columns). here's their note:
Dear Seth,
The issue occurs if matrix dimensions are in the following constraints: m<=(m/nb+1)*nb,
The exact way to do it is (example in C tested on posted reproducer):
MKL_INT nb, nbold, ispec, mione, nthr;
ispec = 1; mione = -1; nthr = omp_num_threads();
nb = ilaenv( &ispec, "DGEQRF", " ", &rows, &cols, &nthr, &mione );
do{
if( rowsnbold = nb;
nb = ilaenv( &ispec, "DGEQRF", " ", &rows, &cols, &nthr, &mione );
} while ( nb != nbold );
It is necessary to repeat because with the columns number change the NB value also could increase.
Thanks,
Vipin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Leona,
Do you hope to see the example code of java qr decomposition?
You may first refer to example code of MKL java under MKL install directory. It include several example, like dgemm, dgesv, dgesvd. So you can modify them to get dgeqrf sample.
Best Regards,
Ying
![](/skins/images/4D2C80A4F3BA3A0D4303300CBEF817A4/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page