- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am attempting to parallelise calls to mkl within a parallel omp region to test whether or not the code executes faster. Simply parallelising part of the code does not yield linear increase in performance, hence a mixed approach makes sense. An outline of the code is as follows:
#pragma omp parallel for for (int i = 0; i < N; i+=2) { some_function(i); }
where some_function will make calls to zgesvd. For starters I would like the omp region to run on 2 threads and the calls to zgesvd inside to also run on 2 threads (for a total of 4 active threads). To achieve this I make the following calls in the begining of the program
omp_set_num_threads(2); mkl_set_num_threads(2); mkl_set_dynamic(false); omp_set_nested(true); omp_set_max_active_levels(2);
I have also tried setting omp threads to 4 and then adding threads(2) to the pragma with no success. Currently, the program creates >>3<< (??) threads on both Windows and Linux using the latest MKL & Intel compilers. Changing the value of omp_set_max_active_levels to 3 produces 4 threads on Windows and 3 threads on Linux. However, I don't exactly know what these threads are doing, I can just see their number.
Best regards
P.S. I noticed that by default the MKL will only try to use 4 threads on a quad-core CPU with hyperthreading enabled but according to top (which should be reliable? I don't really know.) the 4 threads are not always run 1/core (though that might be up to the OS), so why the limit?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear customer,
According to your description, your program probably should be written like:
#include "mkl.h" #include <omp.h> #include <stdio.h> void report_num_threads(int level) { #pragma omp parallel num_threads(2) { //2 sub threads for each omp level printf("level: %d, number of threads in the team - %d, thread: %d\n", level,omp_get_num_threads(), omp_get_thread_num()); some_mkl_function(); } } int main() { omp_set_dynamic(0); omp_set_num_threads(4); int N=4; printf("total threads: %d\n",omp_get_max_threads() ); omp_set_nested(1); #pragma omp parallel num_threads(2) { //omp region - 2 thread report_num_threads(omp_get_thread_num()); } return(0); }
It would be run like:
level 0, sub threads 0;
level 0, sub threads 1;
level 1, sub threads 0;
level 1, sub threads 1.
The mkl_set_num_threads(2) has same functionality with omp_set_num_threads, your program actually totally set 2 threads, not 4. And may I ask the value of N? If N equals to mkl_get_max_threads(), the N probably equals to 2 not 4. Thus the some_function() actually run 1 time for each omp level.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Fiona,
Thank you for your reply. If omp_set_num_threads and mkl_set_num_threads share the same functionality, how would one go about using threaded MKL from within a threaded OMP region?
As for the code; what I actually want is threaded MKL functions (BLAS and LAPACK to use 2 or more threads) but called from within an OMP parallelised for loop.
Regarding the value of N, it is a parameter, but in general it holds that N>=80.
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
marko l. wrote:
what I actually want is threaded MKL functions (BLAS and LAPACK to use 2 or more threads) but called from within an OMP parallelised for loop.
I'm also looking for a way to do this, is this possible? and if yes, how would you go about setting up and binding the threads?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is, an example would be:
#include <mkl.h> #include <omp.h> #include <iostream> #include <random> int main(void) { int ompth = 4; //Number of OMP threads for the for loop int mklth = 2; //Number of MKL threads for the mkl calls //These two parameters need not be constant (i.e. you can read them as arguments if you wish) mkl_set_dynamic(false); omp_set_nested(true); omp_set_max_active_levels(2); #pragma omp parallel for num_threads(ompth) //Set the number of threads for this loop manually for (int i = 0; i < 10; i++) { mkl_set_num_threads_local(mklth); //Set the number of threads for MKL to use within this region //Now we need to run some MKL routine std::mt19937_64 gen(i * 12345); std::uniform_real_distribution<double> dist(-1, 1); int matsize = 10000; MKL_Complex16* A = (MKL_Complex16*)mkl_calloc(matsize * matsize, sizeof(MKL_Complex16), 64); //Alignment for AVX512 calls (soon) MKL_Complex16* B = (MKL_Complex16*)mkl_calloc(matsize * matsize, sizeof(MKL_Complex16), 64); //Alignment for AVX512 calls (soon) MKL_Complex16* C = (MKL_Complex16*)mkl_calloc(matsize * matsize, sizeof(MKL_Complex16), 64); //Alignment for AVX512 calls (soon) for (int j = 0; j < matsize * matsize; j++) { //Give them some random numbers A.real = dist(gen); A .imag = dist(gen); B .real = dist(gen); B .imag = dist(gen); } MKL_Complex16 scale{1, 0}; MKL_Complex16 zero{0, 0}; cblas_zgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, matsize, matsize, matsize, &scale, A, matsize, B, matsize, &zero, C, matsize); std::cout << "Iteration " << i << " completed by OMP thread " << omp_get_thread_num() << ". " << std::endl; } return 0; }
On my machine (Linux) this uses 8 threads. Do make sure the calls to the MKL routines warrant additional threads, sometimes, if for instance your matrices are too small, MKL will only use 1 thread regardless of how many you assigned to it because it simply doesn't make sense to create additional threads for small jobs.
The compilation flags are:
icpc -static -std=c++14 -Wall -O3 -qopenmp -ip -xHOST -use-intel-optimized-headers -fma -qoverride-limits -c test.cpp -o test.o icpc test.o -lm -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -qopenmp -lpthread -o test
Hope this helps.
Best Regards
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page