- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am attempting to parallelise calls to mkl within a parallel omp region to test whether or not the code executes faster. Simply parallelising part of the code does not yield linear increase in performance, hence a mixed approach makes sense. An outline of the code is as follows:
#pragma omp parallel for
for (int i = 0; i < N; i+=2) {
some_function(i);
}
where some_function will make calls to zgesvd. For starters I would like the omp region to run on 2 threads and the calls to zgesvd inside to also run on 2 threads (for a total of 4 active threads). To achieve this I make the following calls in the begining of the program
omp_set_num_threads(2); mkl_set_num_threads(2); mkl_set_dynamic(false); omp_set_nested(true); omp_set_max_active_levels(2);
I have also tried setting omp threads to 4 and then adding threads(2) to the pragma with no success. Currently, the program creates >>3<< (??) threads on both Windows and Linux using the latest MKL & Intel compilers. Changing the value of omp_set_max_active_levels to 3 produces 4 threads on Windows and 3 threads on Linux. However, I don't exactly know what these threads are doing, I can just see their number.
Best regards
P.S. I noticed that by default the MKL will only try to use 4 threads on a quad-core CPU with hyperthreading enabled but according to top (which should be reliable? I don't really know.) the 4 threads are not always run 1/core (though that might be up to the OS), so why the limit?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear customer,
According to your description, your program probably should be written like:
#include "mkl.h"
#include <omp.h>
#include <stdio.h>
void report_num_threads(int level)
{
#pragma omp parallel num_threads(2)
{
//2 sub threads for each omp level
printf("level: %d, number of threads in the team - %d, thread: %d\n",
level,omp_get_num_threads(), omp_get_thread_num());
some_mkl_function();
}
}
int main()
{
omp_set_dynamic(0);
omp_set_num_threads(4);
int N=4;
printf("total threads: %d\n",omp_get_max_threads() );
omp_set_nested(1);
#pragma omp parallel num_threads(2)
{
//omp region - 2 thread
report_num_threads(omp_get_thread_num());
}
return(0);
}
It would be run like:
level 0, sub threads 0;
level 0, sub threads 1;
level 1, sub threads 0;
level 1, sub threads 1.
The mkl_set_num_threads(2) has same functionality with omp_set_num_threads, your program actually totally set 2 threads, not 4. And may I ask the value of N? If N equals to mkl_get_max_threads(), the N probably equals to 2 not 4. Thus the some_function() actually run 1 time for each omp level.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Fiona,
Thank you for your reply. If omp_set_num_threads and mkl_set_num_threads share the same functionality, how would one go about using threaded MKL from within a threaded OMP region?
As for the code; what I actually want is threaded MKL functions (BLAS and LAPACK to use 2 or more threads) but called from within an OMP parallelised for loop.
Regarding the value of N, it is a parameter, but in general it holds that N>=80.
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
marko l. wrote:
what I actually want is threaded MKL functions (BLAS and LAPACK to use 2 or more threads) but called from within an OMP parallelised for loop.
I'm also looking for a way to do this, is this possible? and if yes, how would you go about setting up and binding the threads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is, an example would be:
#include <mkl.h>
#include <omp.h>
#include <iostream>
#include <random>
int main(void) {
int ompth = 4; //Number of OMP threads for the for loop
int mklth = 2; //Number of MKL threads for the mkl calls
//These two parameters need not be constant (i.e. you can read them as arguments if you wish)
mkl_set_dynamic(false);
omp_set_nested(true);
omp_set_max_active_levels(2);
#pragma omp parallel for num_threads(ompth) //Set the number of threads for this loop manually
for (int i = 0; i < 10; i++) {
mkl_set_num_threads_local(mklth); //Set the number of threads for MKL to use within this region
//Now we need to run some MKL routine
std::mt19937_64 gen(i * 12345);
std::uniform_real_distribution<double> dist(-1, 1);
int matsize = 10000;
MKL_Complex16* A = (MKL_Complex16*)mkl_calloc(matsize * matsize, sizeof(MKL_Complex16), 64); //Alignment for AVX512 calls (soon)
MKL_Complex16* B = (MKL_Complex16*)mkl_calloc(matsize * matsize, sizeof(MKL_Complex16), 64); //Alignment for AVX512 calls (soon)
MKL_Complex16* C = (MKL_Complex16*)mkl_calloc(matsize * matsize, sizeof(MKL_Complex16), 64); //Alignment for AVX512 calls (soon)
for (int j = 0; j < matsize * matsize; j++) { //Give them some random numbers
A.real = dist(gen);
A.imag = dist(gen);
B.real = dist(gen);
B.imag = dist(gen);
}
MKL_Complex16 scale{1, 0};
MKL_Complex16 zero{0, 0};
cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, matsize, matsize, matsize, &scale, A, matsize, B, matsize, &zero, C, matsize);
std::cout << "Iteration " << i << " completed by OMP thread " << omp_get_thread_num() << ". " << std::endl;
}
return 0;
}
On my machine (Linux) this uses 8 threads. Do make sure the calls to the MKL routines warrant additional threads, sometimes, if for instance your matrices are too small, MKL will only use 1 thread regardless of how many you assigned to it because it simply doesn't make sense to create additional threads for small jobs.
The compilation flags are:
icpc -static -std=c++14 -Wall -O3 -qopenmp -ip -xHOST -use-intel-optimized-headers -fma -qoverride-limits -c test.cpp -o test.o icpc test.o -lm -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -qopenmp -lpthread -o test
Hope this helps.
Best Regards
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page