- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we are using many image function auf the IPP to calculate (ippiAdd etc.). The library is loaded dynamic.
We could not measure any difference between a single pentium system and a dual core system. Even when I restrict the number of using processors ippSetNumThreads( 1 ), the calculation time is the same like ippSetNumThreads( 2 ).
Best regards, Michael
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Michael,
first of all, not all IPP functions use threading inside. And second, usingof internal threading is really depends on many factors, like data size, target processor and OS. So, most probably you just looking at functions which does not use threading.
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I ran into a similar issue.
In the text file ThreadedFunctionsList.txt that comes with 5.3.2, it has following functions
==========================================
ippiAdd_8u/16s_C1RSfs/C3RSfs/C4RSfs/AC4RSfs
ippiAdd_8u/16s_C1IRSfs/C3IRSfs/C4IRSfs/AC4IRSfs
ippiAdd_32f_C1R/C3R/C4R/AC4R
ippiAdd_32f_C1IR/C3IR/C4IR/AC4IR
==========================================
which confirms that ippiAdd... are threaded functions.
I also did a test using ippiAddWeighted_8u32f_C1IR that is also a threaded function according to the ThreadedFunctionsList.txt.
However, I got almost identical performance results using either 4 threads or 1 threads, using dynamic library or static library.
The function wasfed witha 640 by 480 image to it and run 10000 times to get the time results.
Here are the codes
======================================================
int nt = -1;
int iw = 640;
int ih = 480;
int n = 10000;
Image32f img0;
img0.InitAlloc( iw, ih );
img0.Fill( 1.0);
Ipp32f alpha = 0.01f;
Image8u imgSrc( iw, ih, 1 );
imgSrc.Fill( 1 );
ippGetNumThreads( &nt );
IppStatus s = ippSetNumThreads(4 );
Ipp64u start = ippGetCpuClocks();
clock_t cstart, cend;
cstart = clock();
for( int i = 0; i < n; i++ )
{
s = ippiAddWeighted_8u32f_C1IR(imgSrc.GetDataPtr(), imgSrc.GetStride(),
img0.GetDataPtr(), img0.GetStride(),
imgSrc.GetSize(), alpha );
}
cend = clock();
Ipp64u end = ippGetCpuClocks();
cout<<"number of threads = 4: "<< end - start<<";"<< cend - cstart< ippGetNumThreads( &nt );
s = ippSetNumThreads( 1 );start = ippGetCpuClocks();
cstart = clock();for( int i = 0; i < n; i++ )
{
s = ippiAddWeighted_8u32f_C1IR(imgSrc.GetDataPtr(), imgSrc.GetStride(),
img0.GetDataPtr(), img0.GetStride(),
imgSrc.GetSize(), alpha );
}cend = clock();
end = ippGetCpuClocks();cout<<"number of threads = 1: "<< end - start<<";"<< cend - cstart< return 0;========================================================Did I miss out something simple?Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It could be that if all the relevant data will fit on L2 cache you will see some improvement.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
NCC
Threads 1 2 3 4 5 6 7 8
Time(ms) 15.3 8.9 6.2 5.1 4.6 4.2 3.6 3.3
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page