- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Intel compiler engineers
Is there any known risk when using AVX2 based dynamic library with IPP 7.0 ?
Here's my problem.
A dynamic library built with command /QaxCORE-AVX2 and /O2 works fine with its unit test program. In this dll, some intrinsics were used, such as _mm256_mul_ps, _mm256_add_ps, _mm256_exp_ps, etc. However, when integrating this AVX2 based dynamic library into a project, which had already used some functions from IPP 7.0, the intrinsics' results were getting wrong. I printed the results from _mm256_add_ps and found that the high 128 bits numbers were all ZERO.
Is this phenomenon expected? If so, how can I make it right?
Please suggest me, I am new to this.
Zhongqi Zhang
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhongqi,
Which function are you using? As I recalled, IPP 7.0 main have AVX code, no AVX2 optimization. You may take a look the IPP 7.0 bug list from https://software.intel.com/en-us/articles/intel-ipp-70-library-release-notes/
For your question, first, I may suggest , if possible, please upgrade IPP 2017, get the install package from https://software.intel.com/en-us/intel-ipp/ => Get This Library for Free
secondly, in theory, IPP was ready and self-contained binary. it don't influence or be influenced by Compiler option /QaxCORE-AVX2 and /O2 , any external intrinsics .result. So if it is issue, maybe some change in input or output. Have you located which cause the problem?
And if possible, please provide us test case, so we can see what goes wrong.
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhongqi.
I'm not sure but probably you have found some unexpected side effect in IPP avx code from zeroupper. But as mentioned above we need reproducer to confirm this issue.
Thanks for your feedback and using IPP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ying and Andrey
Thank you for replying my questions.Heres the updated information.
We have removed every dependencies to IPP 7.0. However the problem, which is the high 128 bit of __m256 element were all zero, still exits. So, for now, it seems that the problem has nothing to do with IPP. Unfortunately, we cannot reproduce this problem in a single and easy test case(this is the most difficult problem). The only way is to integate the intrinsics to our main project, and source code is not allowed to share on the internet due to our company's policy. Here's a code block illustrates what the failure source code looks like.
void failureTest() { float* pRaw = (float*)_mm_malloc(4 * 8 * 10, 32); float* pMat = (float*)_mm_malloc(4 * 8 * 3, 32); __m256 mmRaw; __m256 mmMat; float fNum = 1.0f; for (int i = 0; i < 320; ++i) { pRaw = fNum; } for (int i = 0; i < 96; ++i) { pMat = fNum++; } for (int i = 0; i < 10; i++) { mmRaw = _mm256_load_ps(pRaw + i * 8); for (int j = 0; j < 10; ++j) { mmMat = _mm256_loadu_ps(pMat + j); __m256 fv1 = _mm256_add_ps(mmMat, mmRaw); __m256 fv2 = _mm256_mul_ps(fv1, fv1); mmMat = _mm256_add_ps(mmRaw, fv2); mmMat = _mm256_exp_ps(mmMat); } _mm256_store_ps(pRaw + i * 8, mmMat); } }
We can workaround this problem with two methods.
First, declare all the __m256 elements with 'volatile'.However, in this method, performance is not acceptable.
Two, replace _mm256_expf_ps with a substitution found in : http://software-lisc.fbk.eu/avx_mathfun/avx_mathfun.h. In this website, there is a function called 'exp256_ps'. This method shows convincing performance and accuracy. But we still have no clue about what went south in the original code.
Best Regards
Zhongqi Zhang

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page