- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. It is not recommended to simultaneously parallelize your program and employ the Intel MKL internal threading because this will slow down performance. Note that in case 3 above, DFT computation is automatically initiated in a sngle threading mode.Does this warning only apply when using DFT or does it apply to the MKL in general? The context of the warning seems to be limited to how one uses DFT. If one is using the BLAS and LAPACK routines is there any similiar restriction?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is an interesting and important issue. The problem has nothing specific to do with the MKL FFT but is a general issue. It doesn't even have anything specific to do with MKL, but rather is an OpenMP issue. Here's the deal.
When you compile SW with OMP directives or pragmas the compiler you use adds quite a bit of code relating to threading and puts into the modified code a lot of function calls that are supported by a runtime library. In the case of MKL, that RTL is libguide, the threading library used by the Intel compilers. In addition to the function calls, the RTL creates buffers and accounting information, such as how many threads it is managing and can provide such information as whether you are in a threaded region or not.
If you compile your program with, say, the PGI compiler and thread, it will do all the things I mentioned in the previous paragraph and will need its RTL to support threading.
Where does the problem arise? If you call MKL froma threaded region, MKL cannot know that it is in a threaded region and will thread according to the number of threads specified in the environment variable SET_OMP_NUM_THREADS, leading to an over subscription of threads. In addition, which RTL will supply the OpenMP function calls both in MKL and in your program?
The over subsciption of resources can have an enormous impact on performance. We have seen a case where performance was reduced by over a factor of 10 with creating 4 threads/thread on a 4 processor system.
--Bruce

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page