- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are using IPP LIB to process images, the original CPU is INTEL Q9650 4 Cores, 2.66 GHz, with 12 MB L2 cache.
Now we buy a new computer, and CPU is INTEL i7-870 4 Cores, 2.93 GHz, with 1 MB L2 cache and 8 MB L3 cache.
The new computer supposed to be more faster than the original one, and the actual result is the same as expected;but if we want to run two thread parallel, the new computer with i7-870 has a long time delay.
Our image process only invokes the basic function of IPP, we are worndering if the problem is related to the cache of CPU, since for i7-870 there is only 1 MB L2 cache, compare to 12 MB L2 cache for Q9650.
Anyone meets the same problem? or there is any solution?
Regards,
We are using IPP LIB to process images, the original CPU is INTEL Q9650 4 Cores, 2.66 GHz, with 12 MB L2 cache.
Now we buy a new computer, and CPU is INTEL i7-870 4 Cores, 2.93 GHz, with 1 MB L2 cache and 8 MB L3 cache.
The new computer supposed to be more faster than the original one, and the actual result is the same as expected;but if we want to run two thread parallel, the new computer with i7-870 has a long time delay.
Our image process only invokes the basic function of IPP, we are worndering if the problem is related to the cache of CPU, since for i7-870 there is only 1 MB L2 cache, compare to 12 MB L2 cache for Q9650.
Anyone meets the same problem? or there is any solution?
Regards,
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Xie,
I suspect your problem is related to hyper-threading. See this page for a comparison of the two processors:
http://ark.intel.com/Compare.aspx?ids=35428,41315
Along with the size and type of cache and memory bus, HT is a key difference between these two machines. Try adding an ippSetNumThreads(4) to the initialization part of your application to see if that makes a difference.
Paul
p.s. Please see this KB article for more information:
http://software.intel.com/en-us/articles/performance-of-crypto-sample-for-openssl-slowing-down-on-hyper-threading-systems/
I suspect your problem is related to hyper-threading. See this page for a comparison of the two processors:
http://ark.intel.com/Compare.aspx?ids=35428,41315
Along with the size and type of cache and memory bus, HT is a key difference between these two machines. Try adding an ippSetNumThreads(4) to the initialization part of your application to see if that makes a difference.
Paul
p.s. Please see this KB article for more information:
http://software.intel.com/en-us/articles/performance-of-crypto-sample-for-openssl-slowing-down-on-hyper-threading-systems/
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
if you only use basic IPP functions, you may try to use Deferred Mode Image Processing layer (DMIP) which was built on top of IPP and dedicated to build parallel processing pipeline. DMIP is available in IPP sample package, check folder image-processing\dmip
Regards,
Vladimir
if you only use basic IPP functions, you may try to use Deferred Mode Image Processing layer (DMIP) which was built on top of IPP and dedicated to build parallel processing pipeline. DMIP is available in IPP sample package, check folder image-processing\dmip
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I understand DMIP will improve the performance of parallel processing, but we stillconcern why Q9650 can work smoothly but I7 cannot; tecnically i7 is stronger than Q9650.
We don't want to waste time on updating our application to move to DMIP, please tell us if i7 is really not good for process image parallel, then we change the CPU back to Q9650.
Reagrds,
Xie
I understand DMIP will improve the performance of parallel processing, but we stillconcern why Q9650 can work smoothly but I7 cannot; tecnically i7 is stronger than Q9650.
We don't want to waste time on updating our application to move to DMIP, please tell us if i7 is really not good for process image parallel, then we change the CPU back to Q9650.
Reagrds,
Xie
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xie,
performance depend on a number of factors. It is difficult to tell you what might be tuned for better performance in your application without examine whole source code. That's why I suggested you to try DMIP which should almost automatically tune your processing pipeline for best performance.
Regards,
Vladimir
performance depend on a number of factors. It is difficult to tell you what might be tuned for better performance in your application without examine whole source code. That's why I suggested you to try DMIP which should almost automatically tune your processing pipeline for best performance.
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I agree with vladimir, that with such a short description of your subject it is difficult to give a well educated guess.
Can you give a more detailed description of what is meant with "but if we want to run two thread parallel"? Do you have two instances of your program running? If you have hyper-threading activated on your Core i-7, then this may cause performance issues, see documentation for MKL. Hyper-threading really helps if the two threads running on the same physical core perform different tasks. Highly parallelized algorithms running the same code in each thread do not benefit significantly if two threads run on shared ressources of one core. There are several threads in this forum on this topic. In the end it depends on what you mean with "but if we want to run two thread parallel".
Best regards,
TJ
I agree with vladimir, that with such a short description of your subject it is difficult to give a well educated guess.
Can you give a more detailed description of what is meant with "but if we want to run two thread parallel"? Do you have two instances of your program running? If you have hyper-threading activated on your Core i-7, then this may cause performance issues, see documentation for MKL. Hyper-threading really helps if the two threads running on the same physical core perform different tasks. Highly parallelized algorithms running the same code in each thread do not benefit significantly if two threads run on shared ressources of one core. There are several threads in this forum on this topic. In the end it depends on what you mean with "but if we want to run two thread parallel".
Best regards,
TJ
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Xie,
I suspect your problem is related to hyper-threading. See this page for a comparison of the two processors:
http://ark.intel.com/Compare.aspx?ids=35428,41315
Along with the size and type of cache and memory bus, HT is a key difference between these two machines. Try adding an ippSetNumThreads(4) to the initialization part of your application to see if that makes a difference.
Paul
p.s. Please see this KB article for more information:
http://software.intel.com/en-us/articles/performance-of-crypto-sample-for-openssl-slowing-down-on-hyper-threading-systems/
I suspect your problem is related to hyper-threading. See this page for a comparison of the two processors:
http://ark.intel.com/Compare.aspx?ids=35428,41315
Along with the size and type of cache and memory bus, HT is a key difference between these two machines. Try adding an ippSetNumThreads(4) to the initialization part of your application to see if that makes a difference.
Paul
p.s. Please see this KB article for more information:
http://software.intel.com/en-us/articles/performance-of-crypto-sample-for-openssl-slowing-down-on-hyper-threading-systems/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Paul,
yes you are right, ippSetNumThreads(4) is the solution.
thanks all ofyoufor the kindness help.
Regards,
Xie
yes you are right, ippSetNumThreads(4) is the solution.
thanks all ofyoufor the kindness help.
Regards,
Xie
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page