Problems with ippsMul_32fc on Mac OS

meldaproduction · ‎10-11-2010

Hi, I'm using complex multiplication for some convolution purposes. On Windows it works just fine. However on Mac it does not. Weird thing is, that I tested it on some random data and it seems working well. But when I use it inside the algorithm itself, preceded by fft and followed by ifft, then it generates nonsense. When I replace it with my own implementation of complex multiplication (either direct C++ or asm using SSE), then it works fine.
Any ideas? Are there any limitations for the input buffers?

meldaproduction · ‎10-11-2010

Ok, I processed bunch of logs and this is the result:

Real part is always correct, but imaginary sometimes not, it often copies real with inverted sign.

Nothing more. I tried placing my own implementation in front or after the ippsMul_32fc call redirecting the first one to a dummy buffer just to check for correctness and my own implementation always works correctly, but ippsMul_32fc does not. It seems it does that only on Mac OS X.

My specs : OSX 10.6.3, using GCC, because Intel compiler generated some weird things so far.
What should I do?

Ying_H_Intel · ‎10-11-2010

Hello,

Could you attach the piece ofcode of about ippsMul_32fc call (include the input and output)? And the IPP install package?

Regards,
Ying

meldaproduction · ‎10-12-2010

Ok, here it is, first the results:
index, difference (should be 0), my correct implementation, ippsMul_32fc

0 = diff 0.0000, 2.6480, 2.6480
1 = diff -0.4121, -1.9209, -1.5089
2 = diff 0.0000, -0.4410, -0.4410
3 = diff 4.6421, 5.0832, 0.4410
4 = diff 0.0000, -2.0317, -2.0317
5 = diff -3.5517, -2.1986, 1.3531
6 = diff 0.0000, 6.0182, 6.0182
7 = diff 0.0000, 0.4256, 0.4256
8 = diff 0.0000, -2.9735, -2.9735
9 = diff 4.3780, 1.5776, -2.8005
10 = diff 0.0000, 0.4701, 0.4701
11 = diff -6.8187, -6.3486, 0.4701
12 = diff 0.0000, 1.8333, 1.8333
13 = diff 1.5770, 3.9133, 2.3363
14 = diff 0.0000, -7.5734, -7.5734
15 = diff 0.0000, -1.6515, -1.6515
16 = diff 0.0000, 4.1548, 4.1548

Source values:
index, difference has no meaning, value 1, value 2 (note that re and im are below each other).

0 = diff 2.5236, 3.2307, 0.7071
1 = diff 1.2212, 0.5141, -0.7071
2 = diff -5.0832, -5.0832, -0.0000
3 = diff 0.5589, -0.4410, -1.0000
4 = diff 3.6984, 2.9912, -0.7071
5 = diff 0.8251, 0.1180, -0.7071
6 = diff -5.0183, -6.0183, -1.0000
7 = diff -0.4256, -0.4256, 0.0000
8 = diff 3.9252, 3.2181, -0.7071
9 = diff 0.2800, 0.9871, 0.7071
10 = diff -6.3487, -6.3487, -0.0000
11 = diff -1.4701, -0.4701, 1.0000
12 = diff 3.3564, 4.0635, 0.7071
13 = diff 0.7637, 1.4708, 0.7071
14 = diff -8.5735, -7.5735, 1.0000
15 = diff -1.6515, -1.6515, 0.0000
16 = diff 2.4957, 3.2028, 0.7071

2 conclusions:
1) Re is always ok, Im is not.
2) It seems it is always wrong, I was confused 'cos it seems it sometimes works, but it was only because the source was 0. It seems it somehow mingles Re and Im it is multiplying the data with.

Also I should note, that it is possible that src1/src2 is equal to dst, but that should not be a problem. It works fine on Windows. Also this problem occurs on Intel core 2 duo, I cannot test other processors unfortunately.
Installer is named m_cproc_p_11.1.089.dmg

Unfortunately now I'm kind of out of options.

meldaproduction · ‎10-12-2010

And the part of the code, but there is nothing interesting there...

float temp[1 << 14];
{
cnt *= 2;
for (int i=0; i {
const float re = src1;
const float im = src1[i+1];
const float re2 = src2;
const float im2 = src2[i+1];

temp = re * re2 - im * im2;
temp[i+1] = re * im2 + im * re2;
};
};
ippsMul_32fc((Ipp32fc*)src1, (Ipp32fc*)src2, (Ipp32fc*)dst, cnt/2);

My algo generates temp, ippsMul dst, which becomes incorrect. It seems working as a separate application, but not this way. Should the ippStaticInit (or how it is called) be run in all threads?

meldaproduction · ‎10-12-2010

Ok and one more thing. I'm running it on Snow leopard, which is 64-bit, though the application is 32-bit I believe. Anyway if I run a testing app, it works fine. Then I run this plugin, which doesn't work fine, and I checked the pointers to the buffers and src1=dst (which shouldn't matter I hope), but this pointer is really ugly - actually it is negative FFblablasomething. So obviously the OS used addresses above 2GB. Maybe there's some bug in addressing these in ipp.

Gennady_F_Intel · ‎10-12-2010

Whats exactly your linking line?

Can you check the problem with dynanic linking?

--Gennady

meldaproduction · ‎10-12-2010

Ok, it seems that the forum didn't accept the time correcly. :) The problem is still there and I have no idea what to do except using my own implementation. No problem with it, but then I don't know if it is caused by ipp on mac, or ipp on core 2 duo, and if other routines are fine and there's just a little glitch...

Ying_H_Intel · ‎10-13-2010

Hello Meldaproduction,

See from Which version of Intel IPP, Intel MKL and Intel TBB is installed by the Intel Compiler Professional Edition? You should be using the latest IPP version IPP 6.1 update 6 release.

Atcutally, IPP provide some internal error check and version information check. It seems no such issue reported like you have describled.It would be nice if you posted more details about how do you enter the input. Ideally a self-contained example would help a lot.

For example, what the exact valuecnt *= 2; how do you define (Ipp32fc*)src1?

Many problems may be caused bysome tiny errors, for example, the pointer type conversion src as float *and use it as Ipp32fc *.

Here is one samplewe dicusssin
http://software.intel.com/en-us/forums/showthread.php?t=77921 for your reference.

//ippiDFT_test.cpp

#include
#include "ipp.h"

int main()
{

// Print the version of ipp being used
const IppLibraryVersion* lib = ippiGetLibVersion();
printf("%s %s %d.%d.%d.%d\n", lib->Name, lib->Version,lib->major, lib->minor, lib->majorBuild, lib->build);

int lenx=5, leny=10; //5x10 matrix
Ipp32fc* pSrc;
Ipp32fc* pDst;
pSrc = ippsMalloc_32fc(lenx*leny);
pDst = ippsMalloc_32fc(lenx*leny);

for (int i = 0; i < lenx*leny; i++) {
pSrc.re = i;
pSrc.im = 0.0f;
}
IppiDFTSpec_C_32fc *pDFTSpec;
IppStatus status;
IppiSize slen = {leny, lenx};
ippiDFTInitAlloc_C_32fc( &pDFTSpec, slen, IPP_FFT_NODIV_BY_ANY, ippAlgHintAccurate );
status = ippiDFTFwd_CToC_32fc_C1R(pSrc, leny*sizeof(Ipp32fc), pDst, leny*sizeof(Ipp32fc), pDFTSpec, 0 );

printf("%d : %s\n", status, ippGetStatusString(status));
printf("%f, %f, %f, %f\n", pDst[0].re, pDst[0].im, pDst[1].re, pDst[1].im);

status = ippiDFTFree_C_32fc(pDFTSpec); // expect ippStsNoErr
printf("%d : %s\n", status, ippGetStatusString(status));
return 0;
}

Regards,
Ying

meldaproduction · ‎10-14-2010

Thank you Ying, anyway I think I found it - to my surprise I wasn't calling ippStaticInit. I didn't suspect that, because I thought if I don't call it, the methods will work, just slower. Apparently this one is not. Well, at least it seems it is fixed, I need to check it more.
Btw. I think you should check this one, because for example if someone uses some global data, which use this function in initialization, it will be called before ippStaticInit. I know that we shouldn't do it, but if someone does that, it would be really hard to find it.

Cheers!