- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see your problems - the main issue here is IPP API constrains - first of all IPP uses it's own malloc that is just a wrapper over runtime malloc - and the 1st constrain is that ippMalloc has int parameter for allocation size - so you (and we internally) can't allocate more than 2Gb; second - ippiCrossCorr API constrains - there is a plan to remove any internal memory allocations in the future IPP versionsand to provide additional parameter for all functions that require additional memory buffer - pBuffer - so it will be customer responsibility on providing required memory buffers according to ippiCrossCorrGetBufferSize() (also planned).
PS just for Sergey K: actually there is no full-size buffer for transposition - after allFFT by rows done - only 4 - 16 columns are transposed at once (depends on arch and cache size) - sothe cyclic buffer for this purpose is rather small.
Regards,
Igor
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK] It is possibly related to small valuesassigned forHeap ( Reserve / Commit )and Stack( Reserve / Commit). Try to check theLinker Settings.
- IppStatusippsta;
- IppSizeippszSam={16443,3284};
[SergeyK] This is ~205MB and this is nota big size, actually. - intiStepSam=ippszSam.width;
- IppSizeippszTmpl={4899,3280};
- intiStepTmpl=ippszTmpl.width;
- intiXncc=ippszSam.width-ippszTmpl.width+1;//resultwidth
- intiYncc=ippszSam.height-ippszTmpl.height+1;//resultheight
- intiStepncc=iXncc*sizeof(float);//resultstepsize
- IppiSizeippszncc={iXncc,iYncc};
- std::vector<float>vfNCC(iXncc*iYncc);//resultcontainer
[SergeyK] I would recommend to allocate memory forthe array from the Heap instead and later it could be
copied to the STL 'vector' for processing. - float*pfNCC=&(vfNCC[0]);
- //Callingthecorrelationroutine
- ippsta=ippiCrossCorrValid_NormLevel_8u32f_C1R(pucSam,iStepSam,ippszSam,
- pucTemplate,iStepTmpl,ippszTmpl,
- pfNCC,iStepncc);
[SergeyK] I've donetesting some time ago for a 32-bit Windows platformand it is ~2^30, or 1.09GB,
and if your image is based on '8u' type this is an image with size34207x34207.
However, in case of a 64-bit Windows platform numbers could be higher.
[SergeyK] It is not clear if 'ippiCrossCorrValid...' uses some additional memory inside.
[SergeyK]As I mentioned already it is ~2^30 for a 32-bit platform.
[SergeyK] In case a 32-bit Windows platform and '8u' type:
34207x34207 if a memory allocated with 'ippiMalloc...' IPP functions
34208x34208 if a memory allocated with 'malloc' CRT function
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using Intel IPP 6.1, and I am running on Windows 7 x64 (16GB of memory).I am trying to use the "ippiCrossCorrValid_NormLevel_8u32f_C1R" function; however, it returns the following error:ippStsMemAllocErr (Not enough memory allocated for the operation).
[SergeyK]It is possibly related to small valuesassigned forHeap( Reserve / Commit )andStack( Reserve / Commit). Try to check theLinker Settings.
2. Is the maximum allowed size actually ~2^29 bytes?
[SergeyK]As I mentioned already it is ~2^30for a32-bitplatform.If so, why is the call to the IPP functions returning an error?If not, then which is the maximum allowed size when dealing with images?
[SergeyK]In case a32-bitWindows platform and '8u' type:
34207x34207if a memory allocated with 'ippiMalloc...'IPPfunctions
34208x34208if a memory allocated with 'malloc'CRTfunction
1. For the above function, and the above image sizes, what is the number of bytes allocated in memory? Is my assumption correct?
[SergeyK]It is not clear if 'ippiCrossCorrValid...' uses some additional memory inside.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- However, the same "ippStsMemAllocErr" happens.
[SergeyK] It has to be enough. Please take a look at a Note below.
[SergeyK] Thank you for confirming my results!
[SergeyK] I'll try to spend some time todaywith your initial test case.
I havetwo more questions:
Are you making a call to 'ippiCrossCorrValid...'form a DLL or from EXE?
Did you set new Heap ( Reserve / Commit ) and Stack ( Reserve / Commit ) values for a DLL or for EXE?
Note:
Unfortunately, similar problems are happening almost every day with software developers using different
software products. Please take a look at two threads ( my comments are in Posts #2 ):
http://software.intel.com/en-us/forums/showthread.php?t=104191&o=a&s=lr
http://software.intel.com/en-us/forums/showthread.php?t=104137&o=a&s=lr
Also, for one of my software subsystem the following Heap ( Reserve / Commit ) and Stack ( Reserve / Commit )
values are used.They are defined in a cpp source file forEXEmodule and they work:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are you making a call to 'ippiCrossCorrValid...'form a DLL or from EXE?
Did you set newHeap( Reserve / Commit ) andStack( Reserve / Commit ) values for a DLL or for EXE?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK] Yes, I would try it. Some time ago I had some issues and setting these values for a DLL
didn't help.
[SergeyK] What about 512MB?
[SergeyK] If an application allocates a maximum allowed amount then a new request for more Stack
memory will fail. That is why some users see an error message like 'Stack Overflow'
Please review a part of your codes where you allocate a memory for the destination buffer'pDst'.
// Purpose: ippiCrossCorr_Norm() function allows you to compute the
// cross-correlation of an image and a template (another image).
// The cross-correlation values are image similarity measures: the
// higher cross-correlation at a particular pixel, the more
// similarity between the template and the image in the neighborhood
// of the pixel. If IppiSize's of image and template are Wa * Ha and
// Wb * Hb correspondingly, then the IppiSize of the resulting
// matrice with normalized cross-correlation coefficients will be
//
// a) in case of 'Full' suffix:
// ( Wa + Wb - 1 )*( Ha + Hb - 1 ).
// b) in case of 'Same' suffix:
// ( Wa )*( Ha ).
// c) in case of 'Valid' suffix:
// ( Wa - Wb + 1 )*( Ha - Hb + 1 ).
...
// suffix 'R' (ROI) means only scanline alingment (srcStep), in
// 'Same' and 'Full' cases no any requirements for data outstand
// the ROI - it's assumes that template and src are zerro padded.
//
...
// Arguments:
// pSrc - pointer to the source image ROI;
// srcStep - step in bytes through the source image buffer;
// srcRoiSize- size of the source ROI in pixels;
// pTpl - pointer to the template ( feature ) image ROI;
// tplStep - step in bytes through the template image buffer;
// tplRoiSize- size of the template ROI in pixels;
// pDst - pointer to the destination buffer;
// dstStep - step in bytes through the destination image buffer;
...
// Return:
// ippStsNoErr - Ok
// ippStsNullPtrErr - at least one of the pointers to pSrc, pDst or pTpl is NULL;
// ippStsSizeErr - at least one of the sizes of srcRoiSize or tplRoiSize
// is less or equal zero, or at least one of the sizes
// of srcRoiSize is smaller then the corresponding size
// of the tplRoiSize;
// ippStsStepErr - at least one of the srcStep, tplStep or dstStep is less or equal zero;
//
// ippStsMemAllocErr - an error occur during allocation memory for internal buffers.
I can't confirm it practically but it looks like your problem is related to a wrong size ofsome internal buffer(s).
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't confirm it practically but it looks like your problem is related to a wrong size ofsome internal buffer(s).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK] You could easily verify it under Debugger and monitor a'PF Usage' in theTask Manager
on 'Performance' property page.
[SergeyK] I hope that IPP developers will respond.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to provide some techical details.
Here is a summary because lots of questions areasked by LuisS andthere are some myresponses:
[LuisS] ...However, in this case, even if the Sample image is converted to float, it will require less than 2^28 bytes (16443 x 3284 x 4 bytes)...
Significantly less than 2^30 bytes.
[LuisS] ..Therefore, I don't have it clear why the IPP function is returning the "ippStsMemAllocErr" status...
It is because the 'ippiCrossCorr_...' functions are allocating a significant amount of memory internally.
On32-bit Windows platforms ( without AWE ) an application can not allocate more than 2GB of memory.
[LuisS] ...1. For the above function, and the above image sizes, what is the number of bytes allocated in memory? Is my assumption correct?..
Could Intel Software Engineers answer the question?
[LuisS] ...2. Is the maximum allowed size actually ~2^29 bytes?..
It depends on an IPP function and it is already confirmed by two Software Developers that this is about 2^30 bytes for a 32-bit Windows platform.
[LuisS] ...If so, why is the call to the IPP functions returning an error?..
Because inyour case the function tries to allocate lots of memory andit exceedsthe 2GB limitfor a 32-bit platform.
[LuisS] ...If not, then which is the maximum allowed size when dealing with images?..
It depends on a platform, that is, a 32-bit or 64-bit. I wonder if Intel Software Engineerscould follow up?
[SergeyK] ...It is not clear if 'ippiCrossCorrValid...' uses some additional memory inside...
Confirmed after a series of tests and it uses a lot.
[SergeyK] ...It is possibly related to small values assigned for Heap ( Reserve / Commit ) and Stack ( Reserve / Commit )...
No. I don't confirm this. Please do your own tests with a modified Test-Case in the next Post.
[LuisS] ...Is there any way for me to know, before making the call to "ippiCrossCorrValid", whether there will be a memory allocation error?..
I think you could make a series of tests and as a result you will have some numbers.
[LuisS] ...In general, is it any way to know why this allocation error happens?..
Already answered.
[LuisS] ...Unfortunately, my problem is not related to that. I have checked several times the allocation of those pointers...
[LuisS] ...Also, if I use a smaller image, the "ippiCrossCorr_Norm" works without any problem...
[LuisS] ...The problem, I think, is related to what the "ippiCrossCorr_Norm" does inside once it is called...
All three statements are confirmed after a series of tests.
[LuisS] ...I kindly invite you to try (whenever you have time) to use the "ippiCrossCorr_Norm" for
an image size similar to the one of my example, to confirm that the function will not work...
Please take a look ata modified Test-Case in the next Post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp] // Case 1 // #pragma comment( linker, "/HEAP:1069547520,1069547520" ) // #pragma comment( linker, "/STACK:4194304,4194304" ) // Case 2 // #pragma comment( linker, "/HEAP:134217728,134217728" ) // #pragma comment( linker, "/STACK:134217728,134217728" ) // Case 3 // #pragma comment( linker, "/HEAP:268435456,268435456" ) // #pragma comment( linker, "/STACK:268435456,268435456" ) // Case 4 // #pragma comment( linker, "/HEAP:536870912,536870912" ) // #pragma comment( linker, "/STACK:536870912,536870912" ) // Version 3 { // Downsample Factor #define DS_FACTOR 1 // #define DS_FACTOR 2 // #define DS_FACTOR 4 // #define DS_FACTOR 8 // #define DS_FACTOR 16 // #define DS_FACTOR 32 // #define DS_FACTOR 64 // Test 1 - It works when the DS_FACTOR is greater than 1 // It doesn't work when the DS_FACTOR is equal to 1 #define IMAGE_W ( RTint )( 16433 / DS_FACTOR ) #define IMAGE_H ( RTint )( 3284 / DS_FACTOR ) #define TEMPL_W ( RTint )( 4899 / DS_FACTOR ) #define TEMPL_H ( RTint )( 3280 / DS_FACTOR ) // Test 2 - It works // #define IMAGE_W ( RTint )( 16384 / DS_FACTOR ) // #define IMAGE_H ( RTint )( 4096 / DS_FACTOR ) // #define TEMPL_W ( RTint )( 8192 / DS_FACTOR ) // #define TEMPL_H ( RTint )( 2048 / DS_FACTOR ) static Ipp8u pucSam[ IMAGE_W * IMAGE_H ] = { 0x0 }; static Ipp8u pucTemplate[ TEMPL_W * TEMPL_H ] = { 0x0 }; RTint iStepSam = IMAGE_W * sizeof( Ipp8u ); RTint iStepTmpl = TEMPL_W * sizeof( Ipp8u ); IppiSize ippRoiSam = { IMAGE_W, IMAGE_H }; IppiSize ippRoiTmpl = { TEMPL_W, TEMPL_H }; RTint iXncc = ippRoiSam.width - ippRoiTmpl.width + 1; // Result width RTint iYncc = ippRoiSam.height - ippRoiTmpl.height + 1; // Result height RTint iStepncc = iXncc * sizeof( RTfloat ); // Result step size // It works // std::vector< RTfloat > vfNCC( iXncc * iYncc ); // Result container // RTfloat *pfNCC = &( vfNCC[0] ); // It works static Ipp32f pfNCC[ ( IMAGE_W - TEMPL_W + 1 ) * ( IMAGE_H - TEMPL_H + 1 ) ] = { 0.0f }; // It doesn't work ( Win32 exception 0xC00000FD - Stack Overflow ) // Ipp32f pfNCC[ ( IMAGE_W - TEMPL_W + 1 ) * ( IMAGE_H - TEMPL_H + 1 ) ] = { 0.0f }; CrtPrintf( RTU("Image Size: %5ld x %5ldnTemplate Size: %5ld x %5ldnDownsample Factor: %ldn"), ( RTint )IMAGE_W, ( RTint )IMAGE_H, ( RTint )TEMPL_W, ( RTint )TEMPL_H, ( RTint )DS_FACTOR ); st = ::ippiCrossCorrValid_NormLevel_8u32f_C1R( &pucSam[0], iStepSam, ippRoiSam, &pucTemplate[0], iStepTmpl, ippRoiTmpl, &pfNCC[0], iStepncc ); CrtPrintfA( "%sn", ::ippGetStatusString( st ) ); } //*/ } [/cpp]
Outputs for different Downsample Factors are as follows:
...
Image Size : 16433 x 3284
Template Size : 4899 x 3280
Downsample Factor: 1
Not enough memory allocated for the operation
...
...
Image Size : 8216 x 1642
Template Size : 2449 x 1640
Downsample Factor: 2
No error, it's OK
...
...
Image Size : 4108 x 821
Template Size : 1224 x 820
Downsample Factor: 4
No error, it's OK
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a source image16384x4096 and a template image 8192x2048:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a source image16384x4096 and a template image 8192x2048:

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
source image4096x1024 and a template image 2048x512
source image8192x2048 and a template image 4096x1024
source image16384x4096 and a template image 8192x2048

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wonder if a downsampling workaround could work for you? That is, downsample your big source image, or both images ( source and template ),
by some factor.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ippiCrossCorr for such huge images/templates is based on 2D 32f FFT and uses several internal buffers proportional to template size (extended to the nearest power of two); if you use threaded version of IPP library - this number is multiplied by the number of available cores; the best approach is provided above - to perform downsampling or use pyramides and perform CrossCorr for reduced image/template - with aproximate coordinates of max coefficient you'll be able to reduce search area for the full resolution image (CrossCorr is very expensive - up to ~300-600 cpu clocks/pixel, while downsampling is ~100x faster).
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
proportional to template size (extended to the nearest power of two)...
Image size for a source image,that Luis uses, can not be considered as ahuge.
Regarding 2D FFT. That is a very interesting technical information, but I really don't understand why almost
1.5GB of memory is used to calculate it.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
take a look at the formula and try to analyse how many buffers you need for separate calculation of nominator (via FFT) and denominator, take into account that FFT itself requires internal twiddle tables and a buffer for FFT itself + buffer for optimal transpose columns to rows for 2D operation. For smal (in comparison with image size) template sizes (~8x less) so called "frame" algorithm is used and FFT size is proportional to template size, for your case FFT size is proportional to src size:
16443*3284 -> 32768*4096*4 = ~500 M for 1 buffer
algorithm requires 3 buffers - so 1.5 Gb
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why a Diagonal based ( inplace )algorithm for a matrix transposeis not used?
Quoting igorastakhov
take a look at the formula and try to analyse how many buffers you need for separate calculation of nominator (via FFT) and denominator, take into account that FFT itself requires internal twiddle tables and a buffer for FFT itself + buffer for optimal transpose columns to rows for 2D operation. For smal (in comparison with image size) template sizes (~8x less) so called "frame" algorithm is used and FFT size is proportional to template size, for your case FFT size is proportional to src size:
16443*3284 -> 32768*4096*4 = ~500 M for 1 buffer
algorithm requires 3 buffers - so 1.5 Gb
Regards,
Igor
The memoryproblem with 'ippiCrossCorr...' IPP functions is almost 10 years old because it is easily
reproducible with IPP v3.x.
Best regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
16443*3284 -> 32768*4096*4 = ~500 M for 1 buffer
algorithm requires 3 buffers - so 1.5 Gb
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[SergeyK] In that case the situation gets even more interesting because there are lots of memory.
What is a size of the Virtual Memory file?
I cannot afford to further subsample it because I will lose some details.
[SergeyK] In that case a solution based on Partitioning of the source image( suggested byIgor ) looks good.
Did you try to consider another solutions, like SADor Neural Network based?
SAD - Sum of Absolute Differences
Best regards,
Sergey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page