- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My application will be deployed to Atom Z530 which is single core with HT. I already ported IPP into my application and got very desirable performance increased. Will HT benefit from the IPP? How can i leverage or fully utilize the HT using IPP. By setting the number of threads? Whats the optimum number of threads setting for Atom?
Thanks in advance!
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thanks for sharing. The default number of OpenMP threads used by the threaded Intel IPPis equal to the number of hardware threads in the system.In general, you don't need to set it manually.
But HT may not always benefit for all IPP functions, please see more http://software.intel.com/en-us/articles/openmp-and-the-intel-ipp-library/
so may you try disableHT andlet us to knowthe performanceresult, then decide to disable it or enable it or set thread=1 as the IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems
Best Regards,
Ying
Thanks for sharing. The default number of OpenMP threads used by the threaded Intel IPPis equal to the number of hardware threads in the system.In general, you don't need to set it manually.
But HT may not always benefit for all IPP functions, please see more http://software.intel.com/en-us/articles/openmp-and-the-intel-ipp-library/
so may you try disableHT andlet us to knowthe performanceresult, then decide to disable it or enable it or set thread=1 as the IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the prompt reply. I'm only using the ippi, ippm & ipps libraries. I did a test run of my application running 1000 iterations. I didn't observe any obvious performancedifferences with/without HT enabled.
Sorry one more question. From the IPPmanual, stated that there will be a initial timepenalty when API is invokedif IPP is dynamically linked and not for the static linking case.
My application is exported as a DLL and using static emerged of the IPP. I initialized the IPP routines (ippInit()) when the DLL is attached to a process (ie. in DllMain), but i still experience the time penalty for initial call to IPP APIs. Any i missing something??
Thanks in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would recommend you to check IPP functions you use in your application with IPP ThreadedFunctionList.txt file available in IPP distribution. No all IPP functions are threaded (for example, you probably would not expect threading benefits for 3x3 matrix add operation, is not it?).
It is not clear what do you mean under time penalty for initial call to IPP. How do you measure that? Might be you just
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Don't worry too much about time penalties due to dispatching. This penalty is overstatedby the manual and for most applications can be safely ignored. The difference between using a dispatched library (the default for both dynamic and static) and building a special configuration of the library that is specific to your processor just to eliminate the dispatch overhead is usually not worth the effort. If you need to save space in your application, and you can guarantee you will only run on one processor architecture, then building a processor-specific version of the library may be worthwhile, but otherwise it will not be worth the time and effort.
Paul
Don't worry too much about time penalties due to dispatching. This penalty is overstatedby the manual and for most applications can be safely ignored. The difference between using a dispatched library (the default for both dynamic and static) and building a special configuration of the library that is specific to your processor just to eliminate the dispatch overhead is usually not worth the effort. If you need to save space in your application, and you can guarantee you will only run on one processor architecture, then building a processor-specific version of the library may be worthwhile, but otherwise it will not be worth the time and effort.
Paul
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page