- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been using LAPACK for several years. On Windows, I've been using a (now) rather old version of MKL, and on Linux I've been using ATLAS. Yesterday, I decided to try out MKL on a new Linux box. I'm investigating various inconsistencies, but a regression test failure is standing out:
When I take the SVD of the following complex 2x2 matrix...
( 1 0
0 -2i )
… I get a weights (0 1). This is obviously wrong in a number of ways: the weights are not in descending order, and the zero suggests that the matrix is singular, which it clearly isn't. u*w*ct(v) does not reconstitute the original matrix. If I change the -2i to -i the weights change from (0 1) to (1 0). All quite strange! (The weights should be (2 1).)
The problem occurs with both cgesvd() and zgesvd(). There of course might be issues in the wrapper code I'm using, but it's (mostly) quite old and other matrices work fine. Before I investigate further, should I be expecting any changes to how I should use these routines? To get this fixed, I presumably need to write it up properly - assuming I don't realise my error while doing this, do I then post here, or is there are more official channel?
This is a fresh Debian install (10.1) with the default MKL (2019.2.187-1, if I've found the right designation) on a shiny new Xeon W-2145. I'm using "A" for the job settings and workspace sizes seem to work out at 24 and 10. My old notes here seem to suggest there was some ambiguity around what workspace sizes should be used here - could that be relevant? In particular, the size of the second workspace isn't passed into the function. I could only find current MKL documentation for LAPACKE_zgesvd, not zgesvd itself. The general documentation describing the differences between the Fortran API and C LAPACK seem to imply there is just one workspace used here, but the /usr/include/mkl/mkl_lapack.h header appears to have the two that I would expect. But, I can't find any clear documentation of the full zgesvd() parameter set anywhere - am I missing something?
I'm guessing this isn't directly relevant, but valgrind reports a "Syscall param sched_setaffinity(mask) points to unaddressable byte(s)" from deep within the stack when I use SVD in my test pack for the first time. The stack goes DGESVD, mkl_lapack_dgesvd … mkl_serv_domain_get_max_threads, omp_get_num_procs … syscall. Valgrind doesn't report any other memory access issues.
Many thanks,
--Richard
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've just tried LAPACKE_zgesvd() and that seems to give the same incorrect output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay - I've reduced this to a self-contained piece of code and it's still broken. Here it is:
#include <iostream> /* Setup for classic LAPACK: */ #ifdef USE_LAPACK struct doublecomplex { double real; double imag; doublecomplex(double r = 0, double i = 0) : real(r), imag(i) {} }; typedef int integer; typedef double doublereal; extern "C" { int zgesvd_(char *jobu, char *jobvt, integer *m, integer *n, doublecomplex *a, integer *lda, doublereal *s, doublecomplex *u, integer * ldu, doublecomplex *vt, integer *ldvt, doublecomplex *work, integer *lwork, doublereal *rwork, integer *info); #define zgesvd zgesvd_ } #endif /* ... or setup for MKL: */ #ifdef USE_MKL #include <mkl_lapack.h> typedef MKL_Complex16 doublecomplex; #endif /* Do the testing: */ int main() { try { /* Use large workspace sizes in case this was the issue. */ const int WORKSPACE_1_SIZE = 100; const int WORKSPACE_2_SIZE = 100; char jobu [] = "A"; char jobvt[] = "A"; int m = 2; int n = 2; double weights[2]; doublecomplex input[4] = { { 1, 0 }, { 0, 0 }, { 0, 0 }, { 0, -2 } }; doublecomplex range[4]; doublecomplex nullSpace[4]; doublecomplex workspace1[WORKSPACE_1_SIZE]; int workspace1Size = WORKSPACE_1_SIZE; double workspace2[WORKSPACE_2_SIZE]; int info; ::zgesvd(jobu, jobvt, &m, &n, input, &m, weights, range, &m, nullSpace, &n, workspace1, &workspace1Size, workspace2, &info); std::cout << "info = " << info << "\nweights = { " << weights[0] << " " << weights[1] << " }\ncorrect = { 2 1 }" << std::endl; return 0; } catch (std::exception &excep) { std::cerr << excep.what() << std::endl; return 1; } }
Because of the magic of alternatives on Linux, I can actually run the code with either USE_MKL or USE_LAPACK on the box with MKL installed, linking to -llapack or -lmkl_rt. Both binaries are effectively linking to the MKL and give the same incorrect result:
info = 0 weights = { 0 1 } correct = { 2 1 }
On another box with classic LAPACK (i.e. no MKL) I get the correct answer from ATLAS:
info = 0 weights = { 2 1 } correct = { 2 1 }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sidenote: I made a couple of corrections to the second post above and the website has crashed on me several times. I've also seen this error a couple of times near the top of the page in a red box:
Warning: Cannot use a scalar value as an array in dmemcache_piece_cache_set() (line 909 of /home/idz/src/www/sites/all/modules/contrib/memcache/dmemcache.inc).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thanks for the report. Do you have an opportunity to check if the problem exists with the latest version 2019u5?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TLDR: ha - yes, that fixed it!
I had previously checked the bug fix list at https://software.intel.com/en-us/articles/intel-math-kernel-library-2019-bug-fixes - there are only nine items - and hadn't seen anything that looked relevant. Plus installing into user space rather than using the standard Debian installers looked a bit gnarly. But, actually it was very easy! The script declares that Debian 10 is unsupported, but I guess that's cosmetic and just to do with when v10 came out - the script was nice and easy to use and seems to have worked fine.
Is the bug fix list incomplete, or is this bug a knock-on from something else? Do we know if other things have changed?
This probably isn't one for this forum, but: I'm not a Debian apt expert - do we know how (or when!) I can get a clean install of 2019u5 into Debian through the main package installer, with alternatives set up correctly, future updates being integrated automatically etc? I do have root access, but this is a nice shiny new box and I don't want to make too much of a mess ;-)
Many thanks - I'm unstuck for the moment.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Follow-up:
Change history: although there are 9 bug fixes in 2019u5, there are actually a load more since 2019u2. But, I still can't work out which change fixed the issue. I'd be interested to know more about the fix!
Debian: okay, I understand now. 2019u2 is in the current "stable" branch (buster) and 2019u5 is in "testing" (bullseye). Rather than mix stable/testing on our boxes I guess we'll stick to ATLAS until a fixed MKL reaches the stable branch.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page