Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Technologies
- Intel® ISA Extensions
- Complex multiply–accumulate

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

MGRAV

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-23-2017
10:12 AM

325 Views

Complex multiply–accumulate

I everyone,

I need to do a Multiply-Multiply–accumulate operations over complex arrays.

More exactly I need a <-- a + alpha * c * d with a, c and d complex value, and alpha a reel.

It's a key point of my algorithm, and I made many tests trying to find the best solution. However, it looks still very slow to me. (Computing the FFT is faster !)

Currently, I use codes based on intrinsics. Attached the version for AVX. I suppose any solution can be extended easily to SSE and/or AVX-512

__m256 alphaVect=_mm256_set1_ps(Alpha); for (int i = 0; i < 2*size/8; i++) { __m256 C_vec=_mm256_loadu_ps(C+i*8); __m256 D_vec=_mm256_loadu_ps(D+i*8); __m256 CD=_mm256_mul_ps(_mm256_moveldup_ps(D_vec),C_vec); __m256 CCD=_mm256_mul_ps(_mm256_movehdup_ps(D_vec),C_vec); CCD=_mm256_shuffle_ps( CCD, CCD, _MM_SHUFFLE(2,3,0,1)); __m256 valCD=_mm256_addsub_ps(CD,CCD); #if __FMA__ __m256 val=_mm256_fmadd_ps (valCD,alphaVect,_mm256_loadu_ps(dst+i*8));//a*b+c #else __m256 val=_mm256_add_ps (_mm256_mul_ps(valCD,alphaVect),_mm256_loadu_ps(dst+i*8)); #endif _mm256_storeu_ps(dst+i*8,val); }

Have someone a better idea or solution ?

If it is possible to do this operation with IPP (like the multiplication ippsMulPack_32f, ...), MKL, ... I didn't find the solution.

Link Copied

4 Replies

McCalpinJohn

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-30-2017
05:42 AM

325 Views

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-30-2017
06:47 AM

325 Views

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-30-2017
06:48 AM

325 Views

MGRAV

Novice

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-30-2017
08:17 AM

325 Views

John, for sure I can ask the mkl FFT to split the output in the real part and the imaginary part. However, it would logique to me that the fft would slower with this approach. I will do more tests and then switch to this solution if it is faster.

Tim, are you saying that it gives special instruction in SSE* that are not available truth intrinsic ?

I did some more research and find in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual" July 2017 (O.N.: 248966-037) section 12.11.3 that they are using the same approach. So I assume that is the best or one of the best approaches.

For more complete information about compiler optimizations, see our Optimization Notice.