- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The flip function is ippiMirror_8u_C3IR. I timed the function's execution, the code is like below:
int myGetTime(void)
{
LARGE_INTEGER ticks, freq; QueryPerformanceCounter(&ticks);
QueryPerformanceFrequency(&freq);
return (int)(ticks.QuadPart * 1000000 / freq.QuadPart);
}
void myFunc()
{
...
IppiSize iSize;
iSize.width = 1024;
iSize.height = 768;
lineStep = ((1024 * 24 + 31) / 32) * 4;
int startTime, endTime;
for (int i = 0;i 10;i++)
{
startTime = myGetTime();
ippiMirror_8u_C3IR(bmpData, lineStep, iSize, ippAxsHorizontal);
endTime = myGetTime();
printf("Time: %d ", endTime - startTime);
}
...
}
I used CPU specific definitions to test different performance. To my surprise, on my pc, a6(PIII SSE) gives the best result, it is twice as fast as the rest 3 ( px, w7, t7 ), those 3 are almost the same performance. But I am using PentiumD 830!
Can anybody tell me what I did wrong on this? How can I get the best performance regarding this issue?
Thanks.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
yes, it is interesting results. Let's try to understand what happened. Could you please specify what linkage did you use for that test DLL or static libraries? Did you call ippStaticInit (or ippStaticInitCpu) in case of static linkage?
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you just compile the project, it will work as PIII/a6 mode, if you define TEST_PRESCOTT, it will work as P4+Prescott/t7.
In my D830 environment, a6 works twice as fast as t7 mode.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
we analized that issue.The reason of performance degradation for T7 optimized code is unaligned data access. It was noted in IPP documentation that you need to organize your data in such fashion to provide 16-byte aligned memory addresses where it is possible. In this case Intel architecture allow to access the data more efficiently and IPP functions were optimized with taking of care about that feature. Please take a look at attached modified sample, which eliminates this performance issue and provides you the best possible performance.
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By the way, you used static libraries without static dispatching of code. Nothing wrong with this but it seems for such purpuses using of static dispatching with forcing particular cpu-specific code at run time is more convenient. You can have only one executable and you can control at run time which code to use (PX, A6, W7, T7). To do this you need to link *emerged.lib, which contains static dispatcher itself and at the beginning of your program you need to call ippStaticInitCpu() function which takes as a parameter IppCpuType enumerator to provide you with explicit control of which cpu-specific code to dispatch.
PS
In case you do not need in explicit control you can call ippStaticInit() function which detects your cpu type at run time and dispatches the best appropriate code.
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page