Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software
- Software Development SDKs and Libraries
- Intel® Integrated Performance Primitives
- What algorithm was used for function ippiDFTFwd_RToPack_32f_C1R and what is the algorithm complexity?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Zhifei_X_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-26-2013
09:45 PM

96 Views

What algorithm was used for function ippiDFTFwd_RToPack_32f_C1R?

And what is the algorithm complexity? like the method in FFTW?

I tried to increase the image size for the ippiDFTFwd_RToPack_32f_C1R, the execution time is not monotone increasing (of couse, 2^n is of the minimum execution time). Is there any theory analysis of the algorithm complexity for ippiDFTFwd_RToPack_32f_C1R? Many thanks!

Link Copied

6 Replies

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-27-2013
04:34 PM

96 Views

Zhifei_X_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-09-2013
11:36 PM

96 Views

Hi, Sergey, could you send me the articles? I tried to search but can not find the pdf files.

Split-Radix FFT.17649.pdf

SSE in a Fast DCT Algorithm.40981.pdf

Many thanks!

Igor_A_Intel

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-10-2013
03:48 AM

96 Views

Hi,

IPP DFT is based on vector/image length/size decomposition on primes. Prime factors that have special code branches are 2,3,5,7,11,13. For powers of 2 (FFT) supported radixes are 2,4,8,16 (depends on architecture - radix-8 is supported on Intel64, radix-16 - on Xeon-Phi only). For lengths/sizes that can't be decomposed on these primes (or for reminder parts) - convolution based algorithm is used. In general case you may consider algorithm complexity as ~5*n*log2(n) arithmetic operations per 1 call (1D case, n==vector length).

Regards, Igor

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-10-2013
05:40 AM

96 Views

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-10-2013
06:29 PM

96 Views

SergeyKostrov

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

07-11-2013
06:21 AM

96 Views

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.