- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have an application that uses quite a few mkl functions. It runs VERY fast on most machines in most situations. However, I have run into a couple of situations where it is running between 5-20 TIMES slower. The situation comes up when a newer machine ( I. Single-CPU quad-core i2500k Sandy Bridge, or II. Two-CPU twelve core machine with Xeon X5680 cpus ) is running multiple jobs.
The original code is quite complicated, but I have reproduced it with only a couple of calls to dgemm in a loop. The data size is somewhat large ( around 4 million elements per matrix ).
On an older ( just over 1 year old ) i7 laptop, I see only very minor slowdowns with many jobs running -- maybe 1-2%.
I assume I need to look into pinning my process to a cpu/thread affinity.
1) Does this seem a reasonable working diagnosis?
2) If this does seem reasonable, are their function calls for this to add to the program? Or does this need to be handled by system utilities? We run on Linux and Windows (and are seeing the problem on both OS's.
Thank you to the community for taking a look at my message!
The original code is quite complicated, but I have reproduced it with only a couple of calls to dgemm in a loop. The data size is somewhat large ( around 4 million elements per matrix ).
On an older ( just over 1 year old ) i7 laptop, I see only very minor slowdowns with many jobs running -- maybe 1-2%.
I assume I need to look into pinning my process to a cpu/thread affinity.
1) Does this seem a reasonable working diagnosis?
2) If this does seem reasonable, are their function calls for this to add to the program? Or does this need to be handled by system utilities? We run on Linux and Windows (and are seeing the problem on both OS's.
Thank you to the community for taking a look at my message!
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jason,
What MKL version are you using when running on a newer machine?
>>The data size is somewhat large ( around 4 million elements per matrix ).
Am I understand correct that the size of the matrix would be ~ 2000x2000?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is possible. If you wish to run multiple jobs together efficiently on a multi-CPU, you should look into running each in a separate environment window, with a KMP_AFFINITY and NUM_THREADS setting which will keep each job on its own CPU, in order to avoid thrashing cache.
You could handle it by scripting to set the environment variables for each job, or by set_num_threads and putenv function calls from the application.
Once you use affinity settings to pin a job to certain cores, you must avoid pinning another application to the same cores, as you have defeated the efforts the OS makes to allocate available cores. The job may be more complicated when you run with HyperThread enabled.
You could handle it by scripting to set the environment variables for each job, or by set_num_threads and putenv function calls from the application.
Once you use affinity settings to pin a job to certain cores, you must avoid pinning another application to the same cores, as you have defeated the efforts the OS makes to allocate available cores. The job may be more complicated when you run with HyperThread enabled.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady: I am running version 10.2.6 of MKL. Yes, the data size is around 2000 x 2000. This is for my stripped down example. Other jobs with which I am seeing slowdowns are of varying sizes, 200 x 5, 200 x 200, 1000 x 10, etc.
On Linux we compile with GCC ( for now, we are testing icc and hope to find improvements with it ) and linking to libgomp. Is the KMP_AFFINITY still the correct environment variable? Or do I need to use the GCC versions of these?
Also, is this the standard way to handle this? Or should my application be pinning itself to cores internally? I ask because, the guys sending the jobs out are not savvy in low level computer issues and we do not want them to have to deal with setting various environment variables for each job that they queue up.
Thanks again!
On Linux we compile with GCC ( for now, we are testing icc and hope to find improvements with it ) and linking to libgomp. Is the KMP_AFFINITY still the correct environment variable? Or do I need to use the GCC versions of these?
Also, is this the standard way to handle this? Or should my application be pinning itself to cores internally? I ask because, the guys sending the jobs out are not savvy in low level computer issues and we do not want them to have to deal with setting various environment variables for each job that they queue up.
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When running the libmkl_thread and Intel libiomp5, GOMP_CPU_AFFINITY environment variable is recognized and translated to the equivalent KMP_AFFINITY, so you could take your choice. If you are running the gnu_thread library and libgomp, you would use the GOMP_CPU_AFFINITY.
libiomp5 supports the gnu OpenMP function calls; in most cases the performance of libiomp5 and current libgomp is similar, although there are outliers where "your mileage may vary."
libiomp5 supports the gnu OpenMP function calls; in most cases the performance of libiomp5 and current libgomp is similar, although there are outliers where "your mileage may vary."

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page