I have a quick question on cblas_gemm_s8u8s32.
What is the reasoning behind requiring one side to be signed and the other unsigned?
The cuBLAS equivalent of this function, cublasGemmEx, expects both a and b to be signed which seems simpler to work with according to me.
Thank you for the reply.
That's interesting. I'm working on text application and all values are usually signed. Could we expect a fully signed interface in future releases?
We are already using gemm_s16s16s32 with success but are interested in going further in terms of model compression and speed (the application is neural machine translation to be more precise).
If gemm_s8u8s32 is the only planned interface for 8 bits GEMM that's acceptable, we will try to adapt and implement device-specific quantization schemes.
(I also found out that google/gemmlowp requires both operands to be unsigned so there does not seem to be a standard way to provide 8 bits quantization: that's 3 libraries mentionned in this thread and 3 different interfaces!)
For reference, a fully signed INT8 GEMM interface is available in MKL-DNN:
But it looks like it does the computation in... double precision?
when i use QuantizedMatMulWithBias quantized matmul，mkldnn_verbose output is：
mkldnn_verbose,exec,inner_product,igemm_s8u8s32:blas,forward_inference,fsrc:nc fwei:io fbia:x fdst:nc,,mb768ic1024oc512,1.146
but mkldnn dump bin is：
why dump bin is x8s8s32x not s8u8s32？what different in the two method？
hello , when I use cblas_gemm_s8u8s32 , I found the result is error when OP_B(Col Major, Unsigned int8)'s value is over 128。And ， I tested the efficiency of int8 GEMM（use cblas_gemm_s8u8s32） and float GEMM (use cblas_sgemm) on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?