- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everybody,
When I migrate source code from VS2008 + PS2011 + Intel C++ Complier (called project Before) to VS2015 + PSXE2016 + Intel C++ Complier (called project After).
I have the problem is: execution time of ippiDFTInv_CToC_32fc_C1R function in project [After] is slower than project [Before].
Detailed as below (About source code sample, please refer the attach file)
Before migration (ms) After migration (ms) Deviation (ms)
20.076 28.145 8.069
Note: Configuration of PC
- OS: Win7 Enterprise SP1 64bit
- CPU: Intel Core i3-3220 (3.30 GHz)
- RAM: 8GB.
Currently, I don't know reason why execution time of ippiDFTInv_CToC_32fc_C1R function in project [After] is slower than project [Before].
Please help me explain it.
Best regards,
NhanPham.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have redirected this issue to IPP forum for better response.
Thanks and Regards
Anoop
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Which IPP version (arch, dll/static, st/mt, etc.) do you use? Could you provide an output from GetLibVersion()? (please insert this call just before a call to DCT):
const IppLibraryVersion* lib;
lib = ippiGetLibVersion();
printf("%s %s %d.%d.%d.%d\n", lib->Name, lib->Version, lib->major, lib->minor, lib->majorBuild, lib->build);
regards, Igor.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Igor Astakhov,
Thanks for your quick response,
At project [Before] uses IPP version:
ippsy8-7.0.dll+ 7.0 build 205.7 7.0.205.1008
At project [After] uses IPP version:
ippSP AVX (e9) 9.0.2 (r49912) 9.0.2.49912
Best regards,
NhanPham.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Nhan Pham,
Each IPP release is coming to customers with IPP PS (Performance System) - I've checked both these libraries - 7.0 (y8) and 9.0.2 (e9) and see that 9.0.2 is from 1.5x to 7x faster (depends on size). You didn't provide information from ippiGetLibVersion - (a) in your first post your claims are about ippIP function, in the last reply you refer to ippSP functionality; (b) for 7.0 you use dynamic linking (dynamic libraries in 7.0 are multithreaded by default), but it is not clear which linking (dynamic or static) do you use for 9.0.2.
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
NhanPham, it seems to me you added printf into measure loop. Is that has been done intentionally? could you remove this pritf outside and check performance again!
...................
InitializeTimer();
__int64 lstime = GetTimerCounter();
for(int i =0; i< 100; i++)
{
DFTFunction(input, output, sizeX, sizeY, sizeZ);
}
printf("output[1000] = (%f, %f)\n", output[100].re, output[100].im);
printf("Execution time: %f\n", GetExecutionTime(lstime));
..................
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Besides, I see many 'malloc' calls within DFTFunction. GetSize, Init...
Could you benchmark exactly ippiDFTInv_CToC_32fc_C1R?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Igor Astakhov,
I give more information
At project [Before] uses IPP version:
ippiy8-7.0.dll+ 7.0 build 205.7 7.0.205.1004 (dynamic library)
At project [After] uses IPP version:
ippIP AVX (e9) 9.0.0 (r47849) 9.0.0.47849 (dynamic library)
You said, you tested and libraries 9.0 is faster than 1.5 to 1.7 libraries 7.0. But the result is not come from my project that I attached.
See part of code in project that I attached
roiSize.width = sizeX;
roiSize.height = sizeY;
if (ippStsNoErr == ippiDFTGetSize_C_32fc(roiSize, IPP_FFT_DIV_INV_BY_N, ippAlgHintAccurate, &nSpecSize, &nInitSize, &size))
{
// Allocate memory
dftSpec = (IppiDFTSpec_C_32fc*)ippMalloc(nSpecSize);
if (nInitSize > 0)
{
pbyInit = (Ipp8u*)ippsMalloc_8u(nInitSize);
}
// Initializes the context structure for the image DFT functions
if (ippStsNoErr == ippiDFTInit_C_32fc(roiSize, IPP_FFT_DIV_INV_BY_N, ippAlgHintAccurate, dftSpec, pbyInit))
{
pBuffer = ippsMalloc_8u(((size > FTBuffSizeMin) ? size : FTBuffSizeMin));
if (NULL != pBuffer)
{
for (int zc = 0; zc < sizeZ; zc++)
{
ippiDFTInv_CToC_32fc_C1R(input, sizeX * sizeof(Ipp32fc), output, sizeX * sizeof(Ipp32fc), dftSpec, pBuffer);
}
}
}
}
I tried many patterns as
sizeX = 478, sizeY = 454, sizeZ = 64
sizeX = 478, sizeY = 454, sizeZ = 1
sizeX = 4780, sizeY = 454, sizeZ = 64
sizeX = 47800, sizeY = 454, sizeZ = 64
......
With this patterns, DFT functions in libraries 9.0 is always slower than DFT functions in libraries 7.0
If you have not running this source code yet, please run it to check the results
Thanks,
Pham Minh Nhan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Pham Minh Nhan,
I measured DFT performance using very simple code:
#include <stdio.h>
#include "ipp.h"
#define N_LOOP 1000
#define WIDTH 454
#define HEIGHT 478
int main()
{
int sizeBuf, sizeSpec, sizeIni, i, j, srcStep, dstStep;
IppiDFTSpec_C_32fc *ctxDFT;
IppStatus status;
Ipp32fc *src, *dst, *tmpDst;
Ipp32f *tmpSrc;
Ipp8u *buf;
IppiSize roi = { WIDTH, HEIGHT};
Ipp64u c1, c2;
double cpe;
const IppLibraryVersion* lib;
ippInit();
lib = ippiGetLibVersion();
printf("build = %d\n",lib->build);
printf("targetCpu = %s\n",lib->targetCpu);
printf("Name = %s\n", lib->Name);
printf("Version = %s\n", lib->Version);
printf("BuildDate = %s\n", lib->BuildDate);
status = ippiDFTGetSize_C_32fc( roi, IPP_FFT_DIV_INV_BY_N, ippAlgHintAccurate, &sizeSpec, &sizeIni, &sizeBuf );
ctxDFT = (IppiDFTSpec_C_32fc*)ippMalloc( sizeSpec );
buf = ippMalloc( IPP_MAX( sizeBuf, sizeIni ));
status = ippiDFTInit_C_32fc( roi, IPP_FFT_DIV_INV_BY_N, ippAlgHintAccurate, ctxDFT, buf );
src = ippiMalloc_32fc_C1( WIDTH, HEIGHT, &srcStep );
dst = ippiMalloc_32fc_C1( WIDTH, HEIGHT, &dstStep );
// init src
status = ippiImageJaehne_32f_C1R( (Ipp32f*)dst, dstStep, roi );
for( j = 0; j < HEIGHT; j++ ){
tmpSrc = (Ipp32f*)((Ipp8u*)dst + j * dstStep );
tmpDst = (Ipp32fc*)((Ipp8u*)src + j * srcStep );
for( i = 0; i < WIDTH; i++ ){
tmpDst.re = tmpSrc;
tmpDst.im = -tmpSrc;
}
}
// warm cache
status = ippiDFTInv_CToC_32fc_C1R( src, srcStep, dst, dstStep, ctxDFT, buf );
// measure perf
c1 = ippGetCpuClocks();
for( i = 0; i < N_LOOP; i++ ){
status = ippiDFTInv_CToC_32fc_C1R( src, srcStep, dst, dstStep, ctxDFT, buf );
}
c2 = ippGetCpuClocks();
cpe = ((double)c2 - (double)c1)/((double)WIDTH * (double)HEIGHT * (double)N_LOOP);
printf ( "size = %d x %d, cpe = %f\n", WIDTH, HEIGHT, cpe );
ippiFree( src );
ippiFree( dst );
ippFree( buf );
ippFree( ctxDFT );
return 0;
}
It is visible that performance for 8.2.3 and 9.0.3 is the same:
build = 48108
targetCpu = h9
Name = ippIP AVX2 (h9)
Version = 8.2.3 (r48108)
BuildDate = Jul 23 2015
size = 454 x 478, cpe = 48.161625
Press any key to continue . . .
build = 51269
targetCpu = h9
Name = ippIP AVX2 (h9)
Version = 9.0.3 (r51269)
BuildDate = Apr 8 2016
size = 454 x 478, cpe = 49.619253
Press any key to continue . . .
build = 48108
targetCpu = h9
Name = ippIP AVX2 (h9)
Version = 8.2.3 (r48108)
BuildDate = Jul 23 2015
size = 512 x 512, cpe = 13.975082
Press any key to continue . . .
build = 51269
targetCpu = h9
Name = ippIP AVX2 (h9)
Version = 9.0.3 (r51269)
BuildDate = Apr 8 2016
size = 512 x 512, cpe = 14.034056
Press any key to continue . . .
regards, Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Igor Astakhov,
Thanks for your answer,
Currently, my business logic code as attached file. Therefore, I can not change business logic code as your sample code.
When migration IPP library to higher version (from IPP7 to IPP9), I think the performance will increase. But my project is the opposite.
I guess, configuration of my project isn't correct (release mode). But I don't know where are problems.
Thanks,
NhanPham.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page