- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
The flip function is ippiMirror_8u_C3IR. I timed the function's execution, the code is like below:
int myGetTime(void)
{
LARGE_INTEGER ticks, freq; QueryPerformanceCounter(&ticks);
QueryPerformanceFrequency(&freq);
return (int)(ticks.QuadPart * 1000000 / freq.QuadPart);
}
void myFunc()
{
...
IppiSize iSize;
iSize.width = 1024;
iSize.height = 768;
lineStep = ((1024 * 24 + 31) / 32) * 4;
int startTime, endTime;
for (int i = 0;i 10;i++)
{
startTime = myGetTime();
ippiMirror_8u_C3IR(bmpData, lineStep, iSize, ippAxsHorizontal);
endTime = myGetTime();
printf("Time: %d ", endTime - startTime);
}
...
}
I used CPU specific definitions to test different performance. To my surprise, on my pc, a6(PIII SSE) gives the best result, it is twice as fast as the rest 3 ( px, w7, t7 ), those 3 are almost the same performance. But I am using PentiumD 830!
Can anybody tell me what I did wrong on this? How can I get the best performance regarding this issue?
Thanks.
Link kopiert
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hello,
yes, it is interesting results. Let's try to understand what happened. Could you please specify what linkage did you use for that test DLL or static libraries? Did you call ippStaticInit (or ippStaticInitCpu) in case of static linkage?
Regards,
Vladimir
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
If you just compile the project, it will work as PIII/a6 mode, if you define TEST_PRESCOTT, it will work as P4+Prescott/t7.
In my D830 environment, a6 works twice as fast as t7 mode.
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Hello,
we analized that issue.The reason of performance degradation for T7 optimized code is unaligned data access. It was noted in IPP documentation that you need to organize your data in such fashion to provide 16-byte aligned memory addresses where it is possible. In this case Intel architecture allow to access the data more efficiently and IPP functions were optimized with taking of care about that feature. Please take a look at attached modified sample, which eliminates this performance issue and provides you the best possible performance.
Regards,
Vladimir
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
By the way, you used static libraries without static dispatching of code. Nothing wrong with this but it seems for such purpuses using of static dispatching with forcing particular cpu-specific code at run time is more convenient. You can have only one executable and you can control at run time which code to use (PX, A6, W7, T7). To do this you need to link *emerged.lib, which contains static dispatcher itself and at the beginning of your program you need to call ippStaticInitCpu() function which takes as a parameter IppCpuType enumerator to provide you with explicit control of which cpu-specific code to dispatch.
PS
In case you do not need in explicit control you can call ippStaticInit() function which detects your cpu type at run time and dispatches the best appropriate code.
Vladimir
- Als neu kennzeichnen
- Lesezeichen
- Abonnieren
- Stummschalten
- RSS-Feed abonnieren
- Kennzeichnen
- Anstößigen Inhalt melden
Thanks again.

- RSS-Feed abonnieren
- Thema als neu kennzeichnen
- Thema als gelesen kennzeichnen
- Diesen Thema für aktuellen Benutzer floaten
- Lesezeichen
- Abonnieren
- Drucker-Anzeigeseite