Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development SDKs and Libraries
- Intel® oneAPI Math Kernel Library & Intel® Math Kernel Library
- Intel MKL blas_cgemv implementation?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

brandon

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-29-2020
12:33 PM

19 Views

Intel MKL blas_cgemv implementation?

Hi all, I'm new to the community and have searched around, but still hoping maybe someone has some insight on MKL's single-precision, floating-point complex matrix-vector multiply (cgemv) implementation. Essentially, I want to replicate its algorithm but with the int16_t data type instead of float.

I'm hoping to increase the speed of cgemv by implementing my own version using a fixed point, int16 data type with AVX 512 SIMD intrinsics. The idea is with a 16-bit data type (int16_t) vs a 32-bit data type (float), there will be 2x more data-level parallelism and execute faster, with still enough precision for my use case (signal processing for 5G MU-MIMO).

Currently, the MKL's JIT-compiled cgemm is the fastest implementation I've benchmarked for matrix-vector multiplication. When I look at the assembly of a call to normal (non-JIT) cblas_cgemv, I found what looks like the AVX 512 implementation, <mkl_blas_avx512_xcgemv>, which is ~2919 lines long and full of vaddps, vmulps, vshufps, and vfmaddsub instructions -- the last one, fused multiply alternate add subtract seems to be useful for complex multiply when the complex numbers are stored in an interleaved format in memory, i.e. real, imag, real, imag... (http://cacs.usc.edu/education/cs653/Popovici-ComplexSIMD-HPEC17.pdf#page=2)

Is anyone familiar with MKL's cgemv implementation and does this seem like a good idea? Thanks so much in advance!

Assembly instructions breakdown for <mkl_blas_avx512_xcgemv>: https://docs.google.com/spreadsheets/d/17VSrOo5CGGkcxz_wn_xkYJC43rYAaKuOvDdw0RzGFbA/edit?usp=sharing

For more complete information about compiler optimizations, see our Optimization Notice.