- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have some questions on cblas_gemm_s8u8s32.
1. What is the reasoning behind requiring one side to be signed and the other unsigned?
2. When I do matrix multiplication with cblas_gemm_s8u8s32 function, I find that when the column major and the second operator( the unsigned int8 integer value) exceeds 128, the calculation result is wrong. What is the reason? How do I calculate the multiplication of two signed int8 matrices.
3. I tried to use MKLDNN DNNL dnnl_gemm_s8s8s32, but unfortunately, but unfortunately, it was much slower than MKL's cblas_sgemm function on some scales.
4. I tested the efficiency of int8 GEMM (Use cblas_gemm_s8u8s32) and float GEMM on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?
Thanks,
Jingjing Wang
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jingjing,
The reason for signed and unsigned has to do with the AVX 512 VNNI hardware instruction set underneath the software interface. For example, using vpdpbusd [1] instead of vpmaddubsw, vpmaddwd, and vpaddd.
Could you provide more information about particular matrix sizes you are interested in testing?
Even better, it would help expedite if you could provide a consise reproducer, application source code with minimal dependencies, for each issue; 2, 3 and 4.
Thank you for your good questions about cblas_gemm_s8u8s32!
Aaron
[1] https://www.intel.ai/vnni-enables-inference/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are two discussions that may shed light on your questions.
Incorrect result of s8s8s32 gemm? https://github.com/intel/mkl-dnn/issues/476
Best instruction set for s8s8s32 gemm ? https://github.com/intel/mkl-dnn/issues/532
Let me know if you have further questions or a reproducer,
Aaron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jingjing,
For #3 and #4, can you also provide information on the CPU you used when checking performance? If you're running on an AVX2 machine, then the performance behavior you're seeing is expected.
Best,
Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Caday, Peter (Intel) wrote:Thank you for your reply. I check the performance on Intel Xeon CPU E5-2667 v3 @3.2GHz, may be it only support AVX2.Hi Jingjing,
For #3 and #4, can you also provide information on the CPU you used when checking performance? If you're running on an AVX2 machine, then the performance behavior you're seeing is expected.
Best,
Peter
That is to say, dnnl Int8 gemmed will only perform better when it supports AVX512 or higher instruction sets?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jingjing,
We recently add support for avx2 in DNNL for int8 gemm (around end of November, check this commit 35b39a8d). Anyways, performance of int8 vs single precision shouldn't be much better on avx2 platform.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page