- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can set the number of OpenMP threads and it works for "omp parallel for"
ippGetNumThreads always returns 1 though.
ippSetNumThreads(8) reports "No operation has been exacuted"
Not sure how to successfully set the threads. Or if setting the thread count higher will make ippsSortRadixIndexAscend_8u run with multiple threads.
omp_set_dynamic(0); omp_set_num_threads(8); int threads = 0; ippGetNumThreads(&threads); wprintf(L"ippGetNumThreads %d\n", threads); IppStatus errorTh = ippSetNumThreads(8); printf("-- warning %d, %s\n", errorTh, ippGetStatusString( errorTh )); ippGetNumThreads(&threads); wprintf(L"ippGetNumThreads %d\n", threads);
Thanks,
Greg
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Greg,
do you use multithreaded IPP libraries? for non-threaded libs behavior of Get/SetNumthreads is correct.
Regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello thanks for your reply Igor.
Yes, I do have them. I wasn't sure which to select when setting up the project though. Initially I chose Multi-threaded DLL. I just tried switching to Multi-threaded static library and that allowed me to change the thread count.
However ippsSortRadixIndexAscend still only runs on one thread.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
ippsSortRadixIndexAscend is not internal threaded. A related ippsSortRadixAscend function is threaded. You can find the threaded function list at: documentation\en\ipp\common\ThreadedFunctionsList.txt
Thanks,
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Chao,
I tested ippsSortRadixAscend_32s_I there was no performance change with 1 thread vs 4 or 8 threads. Sorting using 10 million elements.
Any suggestions for a parallel (key,value) sort library? I'm looking for a sort on a single CPU, final version will use a 10-14 core Xeon, with the goal to sort 10 million 32bit (key, value) pairs in about 30ms. While that may not be possible, in a single sort I have also thought about multi step sorts. For example a course grained sort using 16 or 8bit keys then sent to a co-processor for an exact sort and further parallel computing.
Decided to use OpenMP and Intel's sorts for a 2 stage sort, as a temporary solution.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Greg,
parallel merge sort will be introduced (2 stage parallel sort - (1) radix, (2) merge) in the IP version that is next after 2017. SortRadix had threaded implementation in some older IPP version, but then was commented because of non-efficient implementation (2 threads were supported only).
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Igor,
That's interesting, the 2 stage parallel sort was what I decided. I'm using ippsSortRadixIndexAscend_32s in combination with OpenMP. Then merging the data sets. (I realize this conversation has drifted way off the titles topic, but was interesting)
My merge/reduce is parallel but not optimized (it manipulates the key/value pairs too not just index), but seems like really good gains for a 4 core CPU with 8 threads. From what I was told about hyperthreading at my last HPC internship, it is like thread switching on cores for higher utilization of the CPU. So to have total gains greater than the core count seems really good.
ippGetNumThreads 1 Set Num THreads warning 0, ippStsNoErr: No errors. ippGetNumThreads 8 ItemCount: 10000000 ItemCount Per Thread 1250000 BufferSize: 5020576 Thread # 5 - Sorting ippsSortRadixIndexAscend_32s... Thread # 4 - Sorting ippsSortRadixIndexAscend_32s... Thread # 6 - Sorting ippsSortRadixIndexAscend_32s... Thread # 7 - Sorting ippsSortRadixIndexAscend_32s... Thread # 3 - Sorting ippsSortRadixIndexAscend_32s... Thread # 1 - Sorting ippsSortRadixIndexAscend_32s... Thread # 2 - Sorting ippsSortRadixIndexAscend_32s... Thread # 8 - Sorting ippsSortRadixIndexAscend_32s... Time for partial sorts with OpenMP(8): 200.29 ms Reduce (4): 138.03 ms Full Sort: 338.32 ms ItemCount: 10000000 BufferSize: 40020576 1. Sorting ippsSortRadixIndexAscend_32s... Time for single sort of all elements: 1726.63 ms Performance Gains with a Quad Core CPU: 5.10x
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page