Crash in IppsDiv_32f_A11 or _A21, IPPv6

gol · ‎03-21-2012

I'm finding a crashing bug in IPP v6.1, but strangely I found nothing about it, not report of any kind, and I would like a confirmation that it was buggy in v6.1, as it doesn't crash with IPP v7.
There's nothing about it in the list of fixes for either 6.1 or 7.
http://software.intel.com/en-us/articles/intel-ipp-70-library-bug-fixes/

Basically it seems to crash with any length that's not a multiple of 8, perhaps there was a bug in the processing of the remainder?
I have no problem using IppsDiv_32f_A24 (which works), but a confirmation would make sure that it's not a bug somewhere else on my side. Thanks.

Ying_H_Intel · ‎03-21-2012

Hi Gol,

It seems unknown problem. See IPP 6.1bug fix list http://software.intel.com/en-us/articles/intel-ipp-library-61-fixes-list/

Could you please provide a small test case for the claimed problem?.Andincluding the information ofipp version,linklibarary and processor type?

Thanks
YingH.

gol · ‎03-22-2012

Intel I7, using the 32bit IPP 6.1,
A call to IppsDiv_32f_A11 or IppsDiv_32f_A21 should crash for a length under 8 (whatever the buffers are filled with, even all 1's). I strongly suspect it's a bug in the processing of the remainder of blocks of 8, that's reading/writing outside the buffers.

But again, no crash with the v7 (which I still avoid due to the drastic change), so it's weird that it appears nowhere in the list of fixes. Maybe because the code has changed for other reasons?
But I can easily live without it.

Ying_H_Intel · ‎03-25-2012

Hi Gol,

Idid a quick test, can't reproduce the problem with the below code.

Butour developer confirmed thatthere is something likely changed between V6 and V7 in div function but it wasn't captured in the list of bugfixes. It can easily be the case if we didnt detect the bug in internal testing (and thus no CQ) but modified div in V7 (which eventually fixed the bug) due to optimization efforts.There are no V6 updates planned anymore.We can do nothing here. So the only recommendation is to migrate to V7.

Best Regards,
Ying

#include "stdio.h"
#include "stdlib.h"
#include "memory.h"
#include "math.h"
#include "ipp.h"

int main(int argc, char* argv[])
{
ippEnableCpu(ippCpuAVX);
ippInit();

const IppLibraryVersion* lib = ippvmGetLibVersion();
printf("%s %s %d.%d.%d.%d\n", lib->Name, lib->Version, lib->major, lib->minor, lib->majorBuild, lib->build);

const Ipp32f x1[128] = {599.088, 735.034, 572.448, 151.640, 1.0,609.005, 361.403, 225.182};
const Ipp32f x2[128] = {385.297, 609.005, 361.403, 225.182,735.034, 572.448, 151.640, 1.0};
Ipp32f y[128];
IppStatus st = ippsDiv_32f_A11( x1, x2, y,128 );
printf(" ippsDiv_32f_A21:\n");
printf(" x1 = %.3f %.3f %.3f %.3f \n", x1[0], x1[1], x1[2], x1[3]);
printf(" x2 = %.3f %.3f %.3f %.3f \n", x2[0], x2[1], x2[2], x2[3]);
printf(" y = %.3f %.3f %.3f %.3f \n", y[0], y[1], y[2], y[3]);
return st;
}

ippvmp8t.lib+ 6.1 build 137.56 6.1.137.781
ippsDiv_32f_A21:
x1 = 599.088 735.034 572.448 151.640
x2 = 385.297 609.005 361.403 225.182
y = 1.555 1.207 1.584 0.673
Press any key to continue . . .

gol · ‎03-26-2012

Thanks.
Well as I wrote I don't really need that function, it was just to report it & to be sure.
As for updating to v7, I'm still among the ones reluctant to do so, as v7 was said to require SSE2 for its lowest optimization, while our app is mainstream & would better work well with SSE1 as well.

One thing (that I once made a thread about) that still bugs me however is the nasty first-time CPU hit, most likely due to threads being created or whatever, which is really bad in the audio world, especially for live use, imagine a synthesizer glitching badly the first time a feature is used.
I thought I'd be able to fix this by "running in" IPP by calling some costy function, once at the beginning of each thread (thinking that maybe it had to be done for each thread in which IPP is used). It kinda reduces the problem but not entirely. I still wish there was a "pre-allocate everything" function that would avoid such glitches & result in a smooth CPU usage, but I don't remember having seen anything related in v7.

Right now I'm calling (at the beginning of each of my threads) IppsConv_32f with buffers of around 1k values, as I know that this function ends up threaded & it's one that was the most obviously causing the glitch. However it still sometimes glitches, and in fact I would expect it once for the process's lifetime, but it seems to be just once after a reboot, as if something was initialized system-wide, not process-wide..?

SergeyKostrov · ‎03-27-2012

Hi Ying,

Thereare acouple ofissues in your Test-Case.

Quoting Ying H (Intel)

...
ippEnableCpu(ippCpuAVX);

[SergeyK] Why do you think thataCPU with AVX has to be used?

...
IppStatus st = ippsDiv_32f_A11( x1, x2, y, 128 );

[SergeyK] 128 is a multiple of 8 and it is asignificantly greater than 8. Did you try to do tests with
vector lengths from 1 to 16, for example?

...

Here is a modified Test-Case:
...
IppStatus st = ippStsNoErr;

IppLibraryVersion *pIppLibVer = ( IppLibraryVersion * )::ippvmGetLibVersion();

printf( "IPP VM Library Information:\n" );
printf( "Major : %ld\nMinor : %ld\nMajor Build: %ld\nBuild : %ld\n",
pIppLibVer->major, pIppLibVer->minor, pIppLibVer->majorBuild, pIppLibVer->build );
printf( "Core DLL : %s\n", pIppLibVer->Name );
printf( "Version : %s\n", pIppLibVer->Version );
printf( "Build Date : %s\n", pIppLibVer->BuildDate );

const int iSize = 16;

Ipp32f x1[iSize] = { 0.0f };
Ipp32f x2[iSize] = { 0.0f };
Ipp32f y[iSize] = { 0.0f };

int i, t;

for( i = 0; i < iSize; i++ )
{
x1 = ( Ipp32f )( 4.0L );
x2 = ( Ipp32f )( 2.0L );
}

for( t = iSize; t > 0; t-- )
{
st = ::ippsDiv_32f_A11( x1, x2, y, t );
// st = ::ippsDiv_32f_A21( x1, x2, y, t );
// st = ::ippsDiv_32f_A24( x1, x2, y, t );
if( st != ippStsNoErr )
{
printf( "Processing Error: [ ippsDiv_32f_A11 ] Failed - Array Size: %ld\n", ( int )t );
break;
}

for( i = 0; i < t; i++ )
{
printf( "%.2f ", y ); // for ippsDiv_32f_A11
// printf( "%.4f ", y ); // for ippsDiv_32f_A21
// printf( "%.6f ", y ); // for ippsDiv_32f_A24
}

printf( "\n" );
}
...

Outputfor'ippsDiv_32f_A11' should look like:

2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00 2.00
2.00 2.00 2.00 2.00
2.00 2.00 2.00
2.00 2.00
2.00

Best regards,
Sergey