Conversion : 64f to 32f & 64fc to 32fc and vice versa

rohitspandey · ‎01-16-2012

Hi,

I need to use 64 bit implementation for certain functions in my application for higher precision, but the complete application uses 32f/32fc data types. Are there any conversion functions for like

ippsRealToCplx_64f32Fc or ippsCplxToReal_64fc32f

ie I am able to convert my 32F/Fc vector into a 64f/fc vector or vice versa.

Is there any way to use to achieve this conversion? Kindly let me know the suitable way to implement this.

Regards

Rohit

igorastakhov · ‎01-16-2012

Hi Rohit,

use 2-stage conversion for this purpose:

IPPAPI ( IppStatus, ippsConvert_32f64f, ( const Ipp32f* pSrc, Ipp64f* pDst, int len ))
IPPAPI ( IppStatus, ippsConvert_64f32f, ( const Ipp64f* pSrc, Ipp32f* pDst, int len ))

and

IPPAPI(IppStatus, ippsCplxToReal_64fc,( const Ipp64fc* pSrc, Ipp64f* pDstRe, Ipp64f* pDstIm, int len ))
IPPAPI(IppStatus, ippsRealToCplx_32f,( const Ipp32f* pSrcRe, const Ipp32f* pSrcIm, Ipp32fc* pDst, int len ))

Regards,
Igor

rohitspandey · ‎01-16-2012

Hi,

Thanks !!. But this implementation is slower . Is there any 1 stage conversion

Regards
Rohit

igorastakhov · ‎01-16-2012

Rohit,

there are 3 possible solutions:

1) what vector length do you use? I guess (according to your previous posts) 10e4-10e6 - for 64f (and 32f too) it is significantly greater than cache1 size - if you perform all required mathematic in a loop by "frames" of size less than cache1 size - you'll see significant speedup: for modern CPUscache1 size usually is 32Kbyte - it is only 4 K doubles, so I think your "frame" size should be 1 K (you have src, buf and dst - ~3K). This means that you need 2 nested loops - an internal one to perform all operations on 1 K doubles and external one that extends it to 10e4-10e6.
2) you can use intrinsics - if you are not experienced with this stuff - I can provide you a draft of code
3) you can use asm - the same as above - I can provide you a draft

I think that the 1st approach is the most appropriate for you - pipelined execution with taking into account cache size usually has the same performance as special functionality because load/store from/to cache1 is almost for free - ~2 cpu clocks, while for memory case it is ~200 cpu clocks...

Regards,
Igor

SergeyKostrov · ‎01-17-2012

In cases when a big data set of 'float' type, already loaded into memory, needs to be converted
and there are some memory constraints a union based data type conversion could be used. I called it
as a ValueSet based data type conversion. It is not as fast as SSE based with 'ippsConvertxxfyyf'
IPP functions but it doesn't need a second memory block for a data set of 'double' type.

Even if a ValueSet based data type conversion has some limitationsI use itin a couple cases.

For example, a size of data set of 'float' type is 134,217,728 elements and it will use 0.5GB
of memory. A size of data set of 'double' type is also 134,217,728 elements but it will use
use 1.0GB of memory. A total amount of memory needed to complete conversion is 1.5GB.

With a ValueSet based conversion a total amount of memory needed to complete conversion is 1.0GB, that
is, for 0.5GB less!

Please take a look at enclosed example:

...
typedef union tagFPVALUESET
{
Ipp32f fValue;
Ipp64f dValue;

inline operator Ipp32f() const { return fValue; };
inline operator Ipp64f() const { return dValue; };
} FPVALUESET;
...

...
FPVALUESET FpVs1[_RTDATA_SIZE] = { 0.0 };
...

...
// Test-Case 2 - Declared as C Array - 4-in-1 - Unrolled
// RTint -> Ipp32f
g_uiTicksStart = SysGetTickCount();
for( i = 0; i < _RTDATA_SIZE; i += 4 )
{
FpVs1[i ].fValue = ( Ipp32f )i;
FpVs1[i+1].fValue = ( Ipp32f )i+1;
FpVs1[i+2].fValue = ( Ipp32f )i+2;
FpVs1[i+3].fValue = ( Ipp32f )i+3;
}
CrtPrintf( RTU("[ RTint to Ipp32f ] [ 4-in-1 ] Converted in : %ld ticks\n"), ( RTint )( SysGetTickCount() - g_uiTicksStart ) );

// Ipp32f -> Ipp64f
g_uiTicksStart = SysGetTickCount();
for( i = 0; i < _RTDATA_SIZE; i += 4 )
{
FpVs1[i ].dValue = ( Ipp64f )FpVs1[i ].fValue;
FpVs1[i+1].dValue = ( Ipp64f )FpVs1[i+1].fValue;
FpVs1[i+2].dValue = ( Ipp64f )FpVs1[i+2].fValue;
FpVs1[i+3].dValue = ( Ipp64f )FpVs1[i+3].fValue;
}
CrtPrintf( RTU("[ Ipp32f to Ipp64f ] [ 4-in-1 ] Converted in : %ld ticks\n"), ( RTint )( SysGetTickCount() - g_uiTicksStart ) );
...

Some Performance results:

...
Data Size : 67108864 elements
Memory Size: 536870912 bytes

Declared as C Array
[ RTint to Ipp32f ] [ 1-in-1 ] Converted in : 1250 ticks
[ Ipp32f to Ipp64f ] [ 1-in-1 ] Converted in : 704 ticks
[ RTint to Ipp32f ] [ 4-in-1 ] Converted in : 703 ticks
[ Ipp32f to Ipp64f ] [ 4-in-1 ] Converted in : 687 ticks
...

SergeyKostrov · ‎01-17-2012

This is a follow up with a screenshot that demonstrates how ValueSets look like in the VS Debugger:

Theleft 'Watch 1' when a ValueSet was initialized for'float' data type, and the right'Watch 1' when it
was initializedfor 'double' data type.