Community
cancel
Showing results for 
Search instead for 
Did you mean: 
xin_xu
Beginner
145 Views

zgesvd in multithread MKL 10.2.5.035 produce wrong result

Hello,

I probably met a bug in one of the MKL (10.2.5.035) subroutine, ZGESVD. I am linking to the multithreaded version of MKL statically. The ZGESVD give me wrong results when I use 32 threads. It gives me correct result if I use one thread. I have a simple non-openmp program that loads in a matrix and carries out the SVD operation to produce this error consistently. The test program, makefile and the data file are all in the attachment.

The test program shows that the first svd call produce correct result. The second svd call is to find out the optimal work size. The third call produce wrong result. This may indicate that the work size is giving the problem. However, wrong results will be produce for other matrices even for the first svd call.

When I use less threads, for example, 16 or 8 or 1, the result is correct for the matrix attached.

This test is made on a Linux node with 32 cores.

Any suggestions?

Thanks & Regards,

Xin

0 Kudos
15 Replies
Gennady_F_Intel
Moderator
145 Views

Xin,
Could you please check if the problem persists with the version 10.2 Update7?
--Gennady
xin_xu
Beginner
145 Views

Gennady,

Thanks for your reply!

I need to get our system administrator's assistance in order to try a different version of MKL. 10.2.5.035 is the latest one installed for now. They are planning to install 10.3 some time this week.

By the way, I downloaded an evaluation version of MKL 10.3 for Linux and tried to install as a user (not root). But the installation seems stalled after EULA appeared and I typed 'accept' and enter. Any ideas on this problem? I can try 10.3 right away if I can install it successfully.

Regards,
Xin
Gennady_F_Intel
Moderator
145 Views

Xin,
I have no idea regarding any installation issue (:-. I will ask to help the Install Engineer to help You with this problem.
--Gennady
Gennady_F_Intel
Moderator
145 Views

Hello Xin,
We reproduced the problem with the all latest versions including 10.3.x. That issues caused by internal threading. We will provide the update of the issue to you when the fix will availble.
Regards, Gennady
Nikolay_L_Intel
Employee
145 Views

Hello Xin,

The installation issue you have described looksan activation problem. The installer reads all Intel license keys registered on your system to make a decision about your current activation level. In case your system (or shared location) has significant amountof licenses this process could take some time.

Could you kindly start installation one more time and let it scan your system for a long time period, like a 30-40 minutes please? If it still freezes please interrupt it and, if possible, send us a log files /tmp/*.issa*.log and /tmp/*.pset*.log (please select correct one sorting by modification time)

We will investigate the issue and return to you with instructions.

As a temporary workaround you could try to backup and then cleanup folders /opt/intel/licenses and $HOME/intel/licenses (please ask for a root assistance if you have no enough permissions) and restart the installation.

Waiting for your reply.
Thank you,
- Nikolay

xin_xu
Beginner
145 Views

Hello Nikolay,

Thanks for your reply! I started the installation yesterday and it is still freezing there. Since Gennady has tested with all the new versions of MKL, I am not going test it again by myself for now. And, our system administrator is going to install the latest MKL, I will just wait for that.

However, I am interested to know what the problem is in case I need to install again. Attached are the log files. There are a few similar log files possibly for my other attempts.



Gennady, Thanks for your info! Hope the fix is a simple one and comes soon.

Thanks all!

Xin
Nikolay_L_Intel
Employee
145 Views

Hello Xin,

Thank you for the information.
The log file shows the freeze at the activation checking, so the initial presumption was correct.

Could you kindly call three commands at you system and send the output please?

1) ldd"/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic"

2) ls ls /share/apps/intel/ict/Compiler/11.1/072/licenses

3) export LD_LIBRARY_PATH=/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e; "/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic" -f"MKernL" -f"MKern" -p"i86_r" -p"i86_re" -p"it64_lr" -p"it64_re" -p"amd64_re" -c"/share/apps/intel/ict/Compiler/11.1/072/licenses"

Thank you very much for your time,

- Nikolay

xin_xu
Beginner
145 Views

Hello, Nikolay,

Here is the output:

[xxu2@dlxlogin2 ~]$ ldd "/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic"
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003734a00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003734200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003740c00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003733e00000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003734600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003733a00000)
[xxu2@dlxlogin2 ~]$ ls -ls /share/apps/intel/ict/Compiler/11.1/072/licenses
16 -rw-r--r-- 1 root root 551 Jul 21 2010 /share/apps/intel/ict/Compiler/11.1/072/licenses
[xxu2@dlxlogin2 ~]$ export LD_LIBRARY_PATH=/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e; "/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic/32e/chklic" -f"MKernL" -f"MKern" -p"i86_r" -p"i86_re" -p"it64_lr" -p"it64_re" -p"amd64_re" -c"/share/apps/intel/ict/Compiler/11.1/072/licenses"
-bash: /home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic/32e/chklic: Not a directory
[xxu2@dlxlogin2 ~]$

Regards,
Xin
xin_xu
Beginner
145 Views

Hello Gennady,

I am wondering whether there are more info about this bug which is related to the internal threading. My main concern is whether the same threading bug resides in the other subroutines such as matrix multipy, qr, lu and matrix inverse. My work relies on these libraries and I do observe strange result when I use SVD with one thread and more threads for others.

I would like to know whether I should avoid using the treaded library for now or what the known safe number of threads to use is for the MKL library.

Any information would be appreciated!

Regards,

Xin

Gennady_F_Intel
Moderator
145 Views

Hi Xin,
We don't expect the problem expect the *svd routines which you've already reported.In the case you have others problems, please let us know.
--Gennady

Gennady_F_Intel
Moderator
145 Views

Hi Xin,
Could You please check if this problem with the latest 10.3 Update3 and let us know if any further problem?10.3.3 has been released yesterday and available at Intel Registration Center.
/gf

xin_xu
Beginner
145 Views

Hello, Gennady,

Thanks for the info. I will try it as soon as I get one installed on our cluster. It could have been much easier if I can install an evaluation version in my own directory. Unfortunately, the installation issue I brought up in the last few messages have not been solved.

Regards
Xin
Nikolay_L_Intel
Employee
145 Views

Hello Xin,

Unfortunately, we are still trying to figure out the root cause of the activation problem.

Did you try to use this workarond:
As a temporary workaround you could try to backup and then cleanup folders /opt/intel/licenses and $HOME/intel/licenses (please ask for a root assistance if you have no enough permissions) and restart the installation.

If it is also unsuccessful please try following steps:
1) Go to /rpms
2) Invoke command: #> rpm -ivh --nodeps --ignorearch --prefix "location for installation" *.rpm

I'm monitoring this topic, so pleasecontact meif you have any questions.

Thank you,
- Nikolay
xin_xu
Beginner
145 Views

Hello, Nikolay,

Thanks for your reply!

I went to check the folder /opt/intel/... But there is no /intel/ folder under /opt. The folder $HOME/intel/licenses is empty.

I tried the second way by invoke the command 'rpm ...'. I got the following message:

error: can't create transaction lock on /var/lib/rpm/__db.000

I guess it is the permission issue. I am sending your sugesstions to our system administrator.

Thanks!
Xin
xin_xu
Beginner
145 Views

Hello Gennady,

MKL10.3 Update 3 produced correct results for the small test case that I posted here. I will run some more cases with it. Hope it works good!

Thanks very much!

Xin
Reply