- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to use cholesky factorization in intel mic, but I am not able to get correct performance.
This is how I run the code:
[root@bunsen-mic0 /tmp]# env USE_2MB_BUFFERS=3000 MKL_NUM_THREADS=240 KMP_AFFINITY=proclist=[1-240],granularity=fine,explicit ./testing_native_dpotrf -N 9600 -L 5
time 1.201130, gflops 245.567177 >>>>>>>>> this is warm up
time 0.865080, gflops 340.960515
time 0.865288, gflops 340.878499
time 0.864819, gflops 341.063349
time 0.864337, gflops 341.253577
time 0.863623, gflops 341.535639
The correct performance should be 500 gfops when size is 9600*9600
here is the strange results when I only use 1 core:
[root@bunsen-mic0 /tmp]# env USE_2MB_BUFFERS=3000 MKL_NUM_THREADS=4 KMP_AFFINITY=proclist=[1-4],granularity=fine,explicit ./testing_native_dpotrf -N 9600 -L 5
time 0.902745, gflops 326.734658
time 0.871131, gflops 338.592037
time 0.870778, gflops 338.729428
time 0.868808, gflops 339.497416
time 0.866140, gflops 340.543143
time 0.864064, gflops 341.361391
This is significantly not corret, looks like the program is mess up with cores. Anyone can help me figure out where is the problem.
The attachment is my code, there is really nothing in it, just call lapacke_dpotrf
BTW, the following is how I compile my code
testing_native_dpotrf: testing_native_dpotrf.c
icc -O3 -mmic -mkl $< -o $@
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page