- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

I need to use 64 bit implementation for certain functions in my application for higher precision, but the complete application uses 32f/32fc data types. Are there any conversion functions for like

ippsRealToCplx_64f32Fc or ippsCplxToReal_64fc32f

ie I am able to convert my 32F/Fc vector into a 64f/fc vector or vice versa.

Is there any way to use to achieve this conversion? Kindly let me know the suitable way to implement this.

Regards

Rohit

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

use 2-stage conversion for this purpose:

IPPAPI ( IppStatus, ippsConvert_32f64f, ( const Ipp32f* pSrc, Ipp64f* pDst, int len ))

IPPAPI ( IppStatus, ippsConvert_64f32f, ( const Ipp64f* pSrc, Ipp32f* pDst, int len ))

and

IPPAPI(IppStatus, ippsCplxToReal_64fc,( const Ipp64fc* pSrc, Ipp64f* pDstRe, Ipp64f* pDstIm, int len ))

IPPAPI(IppStatus, ippsRealToCplx_32f,( const Ipp32f* pSrcRe, const Ipp32f* pSrcIm, Ipp32fc* pDst, int len ))

Regards,

Igor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks !!. But this implementation is slower . Is there any 1 stage conversion

Regards

Rohit

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

there are 3 possible solutions:

1) what vector length do you use? I guess (according to your previous posts) 10e4-10e6 - for 64f (and 32f too) it is significantly greater than cache1 size - if you perform all required mathematic in a loop by "frames" of size less than cache1 size - you'll see significant speedup: for modern CPUscache1 size usually is 32Kbyte - it is only 4 K doubles, so I think your "frame" size should be 1 K (you have src, buf and dst - ~3K). This means that you need 2 nested loops - an internal one to perform all operations on 1 K doubles and external one that extends it to 10e4-10e6.

2) you can use intrinsics - if you are not experienced with this stuff - I can provide you a draft of code

3) you can use asm - the same as above - I can provide you a draft

I think that the 1st approach is the most appropriate for you - pipelined execution with taking into account cache size usually has the same performance as special functionality because load/store from/to cache1 is almost for free - ~2 cpu clocks, while for memory case it is ~200 cpu clocks...

Regards,

Igor

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In cases when a big data set of '**float**' type, already loaded into memory, needs to be converted

and there are some memory constraints a union based data type conversion could be used. I called it

as a **ValueSet** based data type conversion. It is not as fast as **SSE** based with '**ippsConvertxxfyyf**'**IPP** functions but it doesn't need a second memory block for a data set of '**double**' type.

Even if a **ValueSet** based data type conversion has some limitationsI use itin a couple cases.

For example, a size of data set of '**float**' type is **134,217,728** elements and it will use **0.5GB**

of memory. A size of data set of '**double**' type is also **134,217,728** elements but it will use

use **1.0GB** of memory. A total amount of memory needed to complete conversion is **1.5GB**.

With a **ValueSet** based conversion a total amount of memory needed to complete conversion is **1.0GB**, that

is, for **0.5GB** less!

Please take a look at enclosed example:

...

typedef union **tagFPVALUESET**

{

Ipp32f fValue;

Ipp64f dValue;

inline operator Ipp32f() const { return fValue; };

inline operator Ipp64f() const { return dValue; };

} **FPVALUESET**;

...

...

**FPVALUESET** FpVs1[_RTDATA_SIZE] = { 0.0 };

...

...

// **Test-Case 2** - Declared as C Array - 4-in-1 - Unrolled

**// RTint -> Ipp32f**

g_uiTicksStart = SysGetTickCount();

for( i = 0; i < _RTDATA_SIZE; i += 4 )

{

FpVs1[i ].fValue = ( Ipp32f )i;

FpVs1[i+1].fValue = ( Ipp32f )i+1;

FpVs1[i+2].fValue = ( Ipp32f )i+2;

FpVs1[i+3].fValue = ( Ipp32f )i+3;

}

CrtPrintf( RTU("[ RTint to Ipp32f ] [ 4-in-1 ] Converted in : %ld ticks\n"), ( RTint )( SysGetTickCount() - g_uiTicksStart ) );

**// Ipp32f -> Ipp64f**

g_uiTicksStart = SysGetTickCount();

for( i = 0; i < _RTDATA_SIZE; i += 4 )

{

FpVs1[i ].dValue = ( Ipp64f )FpVs1[i ].fValue;

FpVs1[i+1].dValue = ( Ipp64f )FpVs1[i+1].fValue;

FpVs1[i+2].dValue = ( Ipp64f )FpVs1[i+2].fValue;

FpVs1[i+3].dValue = ( Ipp64f )FpVs1[i+3].fValue;

}

CrtPrintf( RTU("[ Ipp32f to Ipp64f ] [ 4-in-1 ] Converted in : %ld ticks\n"), ( RTint )( SysGetTickCount() - g_uiTicksStart ) );

...

**Some Performance results:**

...

Data Size : 67108864 elements

Memory Size: 536870912 bytes

Declared as C Array

[ RTint to Ipp32f ] [ 1-in-1 ] Converted in : 1250 ticks

[ Ipp32f to Ipp64f ] [ 1-in-1 ] Converted in : 704 ticks

[ RTint to Ipp32f ] [ 4-in-1 ] Converted in : 703 ticks

[ Ipp32f to Ipp64f ] [ 4-in-1 ] Converted in : 687 ticks

...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

**ValueSets**look like in the

**VS Debugger**:

Theleft '

**Watch 1**' when a

**ValueSet**was initialized for'

**float**' data type, and the right'

**Watch 1**' when it

was initializedfor '

**double**' data type.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page