- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I need to use 64 bit implementation for certain functions in my application for higher precision, but the complete application uses 32f/32fc data types. Are there any conversion functions for like
ippsRealToCplx_64f32Fc or ippsCplxToReal_64fc32f
ie I am able to convert my 32F/Fc vector into a 64f/fc vector or vice versa.
Is there any way to use to achieve this conversion? Kindly let me know the suitable way to implement this.
Regards
Rohit
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
use 2-stage conversion for this purpose:
IPPAPI ( IppStatus, ippsConvert_32f64f, ( const Ipp32f* pSrc, Ipp64f* pDst, int len ))
IPPAPI ( IppStatus, ippsConvert_64f32f, ( const Ipp64f* pSrc, Ipp32f* pDst, int len ))
and
IPPAPI(IppStatus, ippsCplxToReal_64fc,( const Ipp64fc* pSrc, Ipp64f* pDstRe, Ipp64f* pDstIm, int len ))
IPPAPI(IppStatus, ippsRealToCplx_32f,( const Ipp32f* pSrcRe, const Ipp32f* pSrcIm, Ipp32fc* pDst, int len ))
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks !!. But this implementation is slower . Is there any 1 stage conversion
Regards
Rohit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
there are 3 possible solutions:
1) what vector length do you use? I guess (according to your previous posts) 10e4-10e6 - for 64f (and 32f too) it is significantly greater than cache1 size - if you perform all required mathematic in a loop by "frames" of size less than cache1 size - you'll see significant speedup: for modern CPUscache1 size usually is 32Kbyte - it is only 4 K doubles, so I think your "frame" size should be 1 K (you have src, buf and dst - ~3K). This means that you need 2 nested loops - an internal one to perform all operations on 1 K doubles and external one that extends it to 10e4-10e6.
2) you can use intrinsics - if you are not experienced with this stuff - I can provide you a draft of code
3) you can use asm - the same as above - I can provide you a draft
I think that the 1st approach is the most appropriate for you - pipelined execution with taking into account cache size usually has the same performance as special functionality because load/store from/to cache1 is almost for free - ~2 cpu clocks, while for memory case it is ~200 cpu clocks...
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In cases when a big data set of 'float' type, already loaded into memory, needs to be converted
and there are some memory constraints a union based data type conversion could be used. I called it
as a ValueSet based data type conversion. It is not as fast as SSE based with 'ippsConvertxxfyyf'
IPP functions but it doesn't need a second memory block for a data set of 'double' type.
Even if a ValueSet based data type conversion has some limitationsI use itin a couple cases.
For example, a size of data set of 'float' type is 134,217,728 elements and it will use 0.5GB
of memory. A size of data set of 'double' type is also 134,217,728 elements but it will use
use 1.0GB of memory. A total amount of memory needed to complete conversion is 1.5GB.
With a ValueSet based conversion a total amount of memory needed to complete conversion is 1.0GB, that
is, for 0.5GB less!
Please take a look at enclosed example:
...
typedef union tagFPVALUESET
{
Ipp32f fValue;
Ipp64f dValue;
inline operator Ipp32f() const { return fValue; };
inline operator Ipp64f() const { return dValue; };
} FPVALUESET;
...
...
FPVALUESET FpVs1[_RTDATA_SIZE] = { 0.0 };
...
...
// Test-Case 2 - Declared as C Array - 4-in-1 - Unrolled
// RTint -> Ipp32f
g_uiTicksStart = SysGetTickCount();
for( i = 0; i < _RTDATA_SIZE; i += 4 )
{
FpVs1[i ].fValue = ( Ipp32f )i;
FpVs1[i+1].fValue = ( Ipp32f )i+1;
FpVs1[i+2].fValue = ( Ipp32f )i+2;
FpVs1[i+3].fValue = ( Ipp32f )i+3;
}
CrtPrintf( RTU("[ RTint to Ipp32f ] [ 4-in-1 ] Converted in : %ld ticks\n"), ( RTint )( SysGetTickCount() - g_uiTicksStart ) );
// Ipp32f -> Ipp64f
g_uiTicksStart = SysGetTickCount();
for( i = 0; i < _RTDATA_SIZE; i += 4 )
{
FpVs1[i ].dValue = ( Ipp64f )FpVs1[i ].fValue;
FpVs1[i+1].dValue = ( Ipp64f )FpVs1[i+1].fValue;
FpVs1[i+2].dValue = ( Ipp64f )FpVs1[i+2].fValue;
FpVs1[i+3].dValue = ( Ipp64f )FpVs1[i+3].fValue;
}
CrtPrintf( RTU("[ Ipp32f to Ipp64f ] [ 4-in-1 ] Converted in : %ld ticks\n"), ( RTint )( SysGetTickCount() - g_uiTicksStart ) );
...
Some Performance results:
...
Data Size : 67108864 elements
Memory Size: 536870912 bytes
Declared as C Array
[ RTint to Ipp32f ] [ 1-in-1 ] Converted in : 1250 ticks
[ Ipp32f to Ipp64f ] [ 1-in-1 ] Converted in : 704 ticks
[ RTint to Ipp32f ] [ 4-in-1 ] Converted in : 703 ticks
[ Ipp32f to Ipp64f ] [ 4-in-1 ] Converted in : 687 ticks
...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Theleft 'Watch 1' when a ValueSet was initialized for'float' data type, and the right'Watch 1' when it
was initializedfor 'double' data type.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page