- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include iostream
#include omp.h
#include "mkl.h"
int main(){
int len = 1500;
double* m1;
double* m2;
double* m3;
double t0, tf, tm1, time;
int i, procs;
for (procs =1; procs 4+1; procs++){
if (procs%1==0 || procs==1){
omp_set_num_threads(procs);
m1 = (double*)malloc(len*len*sizeof(double));
m2 = (double*)malloc(len*len*sizeof(double));
m3 = (double*)malloc(len*len*sizeof(double));
#pragma omp parallel for
for (i = 0; i
m1 = (i%10)-5;
m2 = (i%7)-3.5;
m3 = 0;
}
t0 = omp_get_wtime();
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, len, len, len, 1.0, m1, len, m2, len, 0.0, m3, len);
tf = omp_get_wtime();
time = tf-t0;
if (procs == 1) { tm1 = time; }
cout "Elapsed time: " time " - " procs " threads loop ratio:" time/tm1 endl;
free(m1);
free(m2);
free(m3);
}
}
exit(0);
}
To compile:
/opt/intel/cc/9.0/bin/icc -openmp mklTest2.cxx -lmkl -L /opt/intel/mkl/8.0/lib/32/ -I /opt/intel/mkl/8.0/include/
Timings I got on a 32p machine:
./a.out
Elapsed time: 1.19079 - 1 threads loop ratio:1
Elapsed time: 1.18762 - 2 threads loop ratio:0.997338
Elapsed time: 1.18804 - 3 threads loop ratio:0.997687
Elapsed time: 1.21605 - 4 threads loop ratio:1.02121
#include omp.h
#include "mkl.h"
int main(){
int len = 1500;
double* m1;
double* m2;
double* m3;
double t0, tf, tm1, time;
int i, procs;
for (procs =1; procs 4+1; procs++){
if (procs%1==0 || procs==1){
omp_set_num_threads(procs);
m1 = (double*)malloc(len*len*sizeof(double));
m2 = (double*)malloc(len*len*sizeof(double));
m3 = (double*)malloc(len*len*sizeof(double));
#pragma omp parallel for
for (i = 0; i
m1 = (i%10)-5;
m2 = (i%7)-3.5;
m3 = 0;
}
t0 = omp_get_wtime();
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasNoTrans, len, len, len, 1.0, m1, len, m2, len, 0.0, m3, len);
tf = omp_get_wtime();
time = tf-t0;
if (procs == 1) { tm1 = time; }
cout "Elapsed time: " time " - " procs " threads loop ratio:" time/tm1 endl;
free(m1);
free(m2);
free(m3);
}
}
exit(0);
}
To compile:
/opt/intel/cc/9.0/bin/icc -openmp mklTest2.cxx -lmkl -L /opt/intel/mkl/8.0/lib/32/ -I /opt/intel/mkl/8.0/include/
Timings I got on a 32p machine:
./a.out
Elapsed time: 1.19079 - 1 threads loop ratio:1
Elapsed time: 1.18762 - 2 threads loop ratio:0.997338
Elapsed time: 1.18804 - 3 threads loop ratio:0.997687
Elapsed time: 1.21605 - 4 threads loop ratio:1.02121
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like the code is giving good performance for 1p, but it doesn't scale at all after that.
I was wondering if there is any switch that I need to enable so that MKL will be multithreaded. If there isn't, is there something simple I am missing in my code?
Thanks,
Joan
I was wondering if there is any switch that I need to enable so that MKL will be multithreaded. If there isn't, is there something simple I am missing in my code?
Thanks,
Joan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What settings are you using for OMP_NUM_THREADS and KMP_SERIAL?
Are you asking all the threads you created to share the same memory regions, and asking MKL to create as many additional threads as possible?
Are you asking all the threads you created to share the same memory regions, and asking MKL to create as many additional threads as possible?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim, thanks for your reply, it provided me with the pointer to what I needed to change to make it all work.Now, I think this is might be an mkl bug:
I don't set OMP_NUM_THREADS
My code uses the omp_set_num_threads()
It seems though that unless OMP_NUM_THREADS is set to something at the beggining of the program it won't honor any future calls to omp_set_num_threads()
Now, if I take out the call to the MKL function, the plain openmp for loop will actually be parallelized well.
[jpuig@altix jpuig]$ export -n OMP_NUM_THREADS
[jpuig@altix jpuig]$ ./a.out
Elapsed time: 1.8064 - 1 threads loop ratio:1
Elapsed time: 1.79981 - 2 threads loop ratio:0.996353
Elapsed time: 1.85461 - 3 threads loop ratio:1.02669
Elapsed time: 1.82016 - 4 threads loop ratio:1.00762
[jpuig@altix jpuig]$ export OMP_NUM_THREADS=4
[jpuig@altix jpuig]$ ./a.out
Elapsed time: 1.84285 - 1 threads loop ratio:1
Elapsed time: 0.929641 - 2 threads loop ratio:0.504457
Elapsed time: 0.62401 - 3 threads loop ratio:0.338611
Elapsed time: 0.476085 - 4 threads loop ratio:0.258341
[jpuig@altix jpuig]$
I don't set OMP_NUM_THREADS
My code uses the omp_set_num_threads()
It seems though that unless OMP_NUM_THREADS is set to something at the beggining of the program it won't honor any future calls to omp_set_num_threads()
Now, if I take out the call to the MKL function, the plain openmp for loop will actually be parallelized well.
[jpuig@altix jpuig]$ export -n OMP_NUM_THREADS
[jpuig@altix jpuig]$ ./a.out
Elapsed time: 1.8064 - 1 threads loop ratio:1
Elapsed time: 1.79981 - 2 threads loop ratio:0.996353
Elapsed time: 1.85461 - 3 threads loop ratio:1.02669
Elapsed time: 1.82016 - 4 threads loop ratio:1.00762
[jpuig@altix jpuig]$ export OMP_NUM_THREADS=4
[jpuig@altix jpuig]$ ./a.out
Elapsed time: 1.84285 - 1 threads loop ratio:1
Elapsed time: 0.929641 - 2 threads loop ratio:0.504457
Elapsed time: 0.62401 - 3 threads loop ratio:0.338611
Elapsed time: 0.476085 - 4 threads loop ratio:0.258341
[jpuig@altix jpuig]$
Message Edited by joan.puig@gmail.com on 04-03-200602:58 PM

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page