- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using the trust region solver in MKL and having issues where dtrnlsp_solve takes significantly longer to complete. I have got many threads that all need to run an optimization using the trust region solver, each optimization problem has about 200 residuals and about 40-70 unknowns. When I get to a high number of threads needing to perform the optimization I start to see (though concurrency profiling) that many of the threads are blocked in the solve for up to 20 times longer than a normal solve. I start to see this behaviour when I have about 40-60 threads which could call the trust region solver. I have tried two versions of MKL. Initially I was using version 11.1.2 and seeing the trust region threads spinning with a call stack ending in mkl_serv_lock <- mkl_serv_deallocate. I then tried version 11.3.0 and saw the threads spinning or sleeping in tbb under mkl_serv_allocate.
I'm using external threading so I am running MKL in sequential mode. I'm also using the tbb allocator and 64 bit versions.
Ideally I would like to find a solution that works for MKL version 11.1.2. There appears to be a small change in the solution produced by the optimization between 11.1.2 and 11.3.0 with the older version appearing to converge to a smaller overall error.
Thanks in advance
Steven
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stephen, How many of external threads You create while calling the sequential version of mkl's routine ? and How many of threads available on your system?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Gennady
Thanks for getting back to me. I originally noticed the problem in a application with about 60 external threads calling MKL for part of their processing. This was running on an 8 core i7 with hypethreading.
I have now run tests on three different computers. One with quad core i7, 8Gb RAM, hyperthreading turned off. Second with 8 core i7, 16Gb RAM, hyperthreading turned on. Third with two 6 core Xeon, 12Gb RAM, hyperthreading turned off. I have run the same test on all three with 4, 8, 16, 20, 40 and 80 threads. In each test the total computation required is the same. For all three machines I see very similar behaviour. In the following results the processing times are approximate and relative to the processing time for 4 threads on that machine. These results were collected with MKL 11.1.2. These are results from my test setup.
Quad core
Threads Processing Time Locking Observed
4 1.0 No
8 0.9 No
16 0.9 No
20 0.9 Infrequent
40 1.1 Yes
80 1.3 Yes
8 Core
Threads Processing Time Locking Observed
4 1.0 No
8 0.6 No
16 0.4 No
20 0.4 Infrequent
40 0.5 Yes
80 1.4 Yes
12 Core
Threads Processing Time Locking Observed
4 1.0 No
8 0.6 No
16 0.4 Infrequent
20 0.4 Infrequent
40 0.4 Yes
80 0.8 Yes
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page