- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am working on color space converter applications. i have tried AVX512, AVX2 and got no significant performance improvement against SSE2.
i upload my test tool, CPU information, compiler setting and my test result here.
would you please share me any comments?
thanks
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Yale
I'd like to update the status of this issue as we discussed in emails. This was caused by memory bandwidth limit and latency for AVX512 when running in single thread.
Optimize the example with Openmp and run in parallel will make use of large L2 and L3 caches on SKL, and you will see the performance improvement from AVX512 over AVX2 and SSE.
$ ./VPTestBench
pbSrcY 0x7fd6cd06f040, pbOut1 0x7fd6c7182080 pbOut2 0x7fd6c9126100 pbOut3 0x7fd6cb0ca180
time cpu 965 sse2 95 avx2 89 avx512 82
Thanks
Yolanda

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page