- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think this is a threading bug. In case I missed something, please let me know. Thanks.
When performing dense sparse level 3 operation (mkl_dcsrmm), I got inconsistent result depending on the number of cores used. Please see the attached c file (dcsrmm.c).
B (1001-by-1001) dense matrix with only 1 nonzero, B(0,0) = 1.0.
A (1001-by-1001) CSR sparse matrix also with only 1 nz, A(0,0) = 1.0
For C = B * A, I should get C(0,0) = 1.0 and this happens when I use 1 core or more than 2 cores. However, if I set OMP_NUM_THREADS=2, I get C(0,0) = 0.0 which is incorrect.
It seems this can be reproducible on all platforms with more than 2 cores.
I'm using MKL 10.2.2.
Jaewon
Thanks for the test.
It's easy to check the is it threading bug or not to link the test with sequential libraries.
Can you check it on your side?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the test.
It's easy to check the is it threading bug or not to link the test with sequential libraries.
Can you check it on your side?
--Gennady
Unable to reproduce with sequential libraries.
Jaewon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unable to reproduce with sequential libraries.
Jaewon
Jaewon,
yes i've got the same result. We will investigate the problem and will back if any news asap.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jaewon,
yes i've got the same result. We will investigate the problem and will back if any news asap.
--Gennady
Any update on this issue?
Jaewon
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried the following simple program to generate random numbers for parallel threads using Cilk.
#include "mkl.h"
#include <cilk/cilk.h>
#define NUM_THREADS 2
VSLStreamStatePtr stream[NUM_THREADS];
void cal(int sid) {
double buff[1024];
vdRngUniform(VSL_METHOD_DUNIFORM_STD, stream[sid], 1024, buff, -1.0, 1.0);
}
int main() {
for (int i = 0; i < NUM_THREADS; i++) {
vslNewStream(&stream, VSL_BRNG_WH, 1);
}
for (int i = 0; i < NUM_THREADS; i++) {
cilk_spawn cal(i);
}
cilk_sync;
for (int i = 0; i < NUM_THREADS; i++)
vslDeleteStream(&stream);
return 0;
}
The I compile it with icpc compiler icpc (ICC) 14.0.1 20131008.
icpc -std=c++11 -fopenmp -mkl -I/opt/intel/include/ -g -o pi_mont pi_mont.cpp
If I run it inside cilkscreen, it shows the race condition error.
$ cilkscreen ./pi_mont
Cilkscreen Race Detector V2.0.0, Build 4225
Race condition on location 0x7fc000ee8d00
write access at 0x7fc000b877e3: (vdRngUniform+0xf3)
read access at 0x7fc000b87761: (vdRngUniform+0x71)
called by 0x40172c: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:10, cal+0x60)
called by 0x401a3a: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:20, main+0x2fc)
1 error found by Cilkscreen
Cilkscreen suppressed 1 duplicate error messages
Ali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried the following simple program to generate random numbers for parallel threads using Cilk.
#include "mkl.h"
#include <cilk/cilk.h>
#define NUM_THREADS 2
VSLStreamStatePtr stream[NUM_THREADS];
void cal(int sid) {
double buff[1024];
vdRngUniform(VSL_METHOD_DUNIFORM_STD, stream[sid], 1024, buff, -1.0, 1.0);
}
int main() {
for (int i = 0; i < NUM_THREADS; i++) {
vslNewStream(&stream, VSL_BRNG_WH, 1);
}
for (int i = 0; i < NUM_THREADS; i++) {
cilk_spawn cal(i);
}
cilk_sync;
for (int i = 0; i < NUM_THREADS; i++)
vslDeleteStream(&stream);
return 0;
}
The I compile it with icpc compiler icpc (ICC) 14.0.1 20131008.
icpc -std=c++11 -fopenmp -mkl -I/opt/intel/include/ -g -o pi_mont pi_mont.cpp
If I run it inside cilkscreen, it shows the race condition error.
$ cilkscreen ./pi_mont
Cilkscreen Race Detector V2.0.0, Build 4225
Race condition on location 0x7fc000ee8d00
write access at 0x7fc000b877e3: (vdRngUniform+0xf3)
read access at 0x7fc000b87761: (vdRngUniform+0x71)
called by 0x40172c: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:10, cal+0x60)
called by 0x401a3a: (/localstore/theorie/amirsol/paralleluct/pi_mont.cpp:20, main+0x2fc)
1 error found by Cilkscreen
Cilkscreen suppressed 1 duplicate error messages
Ali
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The topic is discussed in another thread, https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/283349#comment-1846156
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page