Cartesian to polar conversion

eliosh · ‎01-13-2011

I am interested in a fast Cartesian to polar conversion.

MKL's documentation suggests the following method

...
vdHypot(nelements, re, im, magnitude);
vdAtan2(nelements, re, im, phase);
....

However, my tests indicates that the performance of the above method is about two times worse than the performance of an IPP function:

....
ippsCartToPolar_64f(re, im, magnitude, phase, nelements);
....

Such a difference seems a little strange. Is there anything that I am missing here?

Thank you.

mecej4 · ‎01-13-2011

Which versions of MKL and IPP? Are we, by some chance, comparing speeds of 80x87 instruction sequences and SSE2 instruction sequences?

Gennady_F_Intel · ‎01-13-2011

Did you check how this perf.result depends on the input sizes?. Please pay into attention that all VML functions are highly optimized for large vector sizes, say 1K.

--Gennady

eliosh · ‎01-13-2011

I am using MKL v10.3.1 and IPP v7.0.1

Both routines are used from MATLAB as a MEX file (just another name for a shared library) and performance difference is a function of how large your input is. As is evident from the table below MKL lags after IPP up to certain input size.

| n (input size) | MKL (time) | IPP (time) |
----------------------------------------------------
10000 | 0.0064 | 5.6987e-04 |
----------------------------------------------------
20000 | 0.0237 | 0.0013 |
----------------------------------------------------
40000 | 0.0722 | 0.0022 | <---- STRANGE !!! (MKL's result is too bad)
----------------------------------------------------
80000 | 0.0264 | 0.0040 |
----------------------------------------------------
160000 | 0.0411 | 0.0081 |
----------------------------------------------------
320000 | 0.0618 | 0.0155 |
----------------------------------------------------
640000 | 0.1297 | 0.0310 |
----------------------------------------------------
1280000 | 0.2620 | 0.0613 |
----------------------------------------------------
2560000 | 0.3692 | 0.1221 |
----------------------------------------------------
5120000 | 0.4053 | 0.2740 |
----------------------------------------------------
10240000 | 0.6073 | 0.5592 |
----------------------------------------------------
20480000 | 1.0382 | 1.1212 |
----------------------------------------------------

P.S.
There is is an error in Intel's documentation that comes with MKL. In the following code (taken from FFT: Auxiliary Data Transformations) one has to swap re and im in the call to vdAtan2()

[cpp]// Cartesian->polar conversion of complex data
// Cartesian representation: z = re + I*im
// Polar representation: z = r * exp( I*phi )
#include 
 
void 
variant1_Cartesian2Polar(int n,const double *re,const double *im,
                         double *r,double *phi)
{
    vdHypot(n,re,im,r);         // compute radii r[]
    vdAtan2(n,re,im,phi);       // compute phases phi[]
}[/cpp]

Dmitry_B_Intel · ‎01-14-2011

Hi eliosh,

First of all, thank you for pointing out the mistake.

That IPP performs faster than MKL for large arrays is understood: with provided MKL implementation of Cartesian2Polarthe whole data set travels a couple of times to-from cache.Blocking of the set for better cache utilization will improve performanceof thefunction, ofcourse, but this was out of scope of this example.

On smaller arrays you are likely seeing performance of threaded IPP (the MKL example is not threaded).

And, you are welcome to submit a feature request for the functionality and performanceyou need at http://premier.intel.com

Thanks
Dima

Ilya_B_Intel · ‎01-24-2011

Hi, eliosh!

MKL transcendental math functions have 3 accuracy levels: VML_HA (most accurate), VML_LA (in the middle), VML_EP (fastest). The default level is VML_HA, which is the most precise, but slowest at the same time.

IPP functionippsCartToPolar_64f does not have that strict accuracy requirements.

In order to make fair comparison you can set lower accuracy requirements

vmlSetMode(VML_LA)

or

vmlSetMode(VML_EP)

In both cases MKL will likely give better results.

Another effect might be found in your timing system: while Iqualitativelyreproduced your results for larger n, smaller n shows not that much difference MKL vs. IPPOne of possible effect may be in warm vs. cold cache. In order to check it you may try to swap your MKL and IPP measurements (measure IPP first, and then MKL). If results for smaller n change significantly - this is probably the case. Hard to say something more without code.

Thanks,

Ilya