Clarification on signed/unsigned input for cblas_gemm_s8u8s32

guillaumekln · ‎07-18-2018

Hello,

I have a quick question on cblas_gemm_s8u8s32.

What is the reasoning behind requiring one side to be signed and the other unsigned?

The cuBLAS equivalent of this function, cublasGemmEx, expects both a and b to be signed which seems simpler to work with according to me.

Thanks,

Guillaume

Jing_Xu · ‎07-22-2018

It is because for most cases in image processing, weights are usually signed values, and elements of image are usually unsigned values.

guillaumekln · ‎07-23-2018

Thank you for the reply.

That's interesting. I'm working on text application and all values are usually signed. Could we expect a fully signed interface in future releases?

Jing_Xu · ‎07-26-2018

I'll escalate this request to engineer team. They will make the decision.

Jing_Xu · ‎07-29-2018

Hi,

Could you try to use gemm_s16s16s32?

guillaumekln · ‎07-29-2018

We are already using gemm_s16s16s32 with success but are interested in going further in terms of model compression and speed (the application is neural machine translation to be more precise).

If gemm_s8u8s32 is the only planned interface for 8 bits GEMM that's acceptable, we will try to adapt and implement device-specific quantization schemes.

(I also found out that google/gemmlowp requires both operands to be unsigned so there does not seem to be a standard way to provide 8 bits quantization: that's 3 libraries mentionned in this thread and 3 different interfaces!)

Jing_Xu · ‎07-31-2018

Hi,

For technical reasons, we only have s8u8s32 and s16s16s32 for integer gemm now.

guillaumekln · ‎12-06-2018

For reference, a fully signed INT8 GEMM interface is available in MKL-DNN:

https://intel.github.io/mkl-dnn/group__c__api__blas.html#gac1869eab851b572350fb450c50c61626

But it looks like it does the computation in... double precision?

jianqian__zhou · ‎08-05-2019

when i use QuantizedMatMulWithBias quantized matmul，mkldnn_verbose output is：

mkldnn_verbose,exec,inner_product,igemm_s8u8s32:blas,forward_inference,fsrc:nc fwei:io fbia:x fdst:nc,,mb768ic1024oc512,1.146

but mkldnn dump bin is：

mkldnn_dump_gemm_x8s8s32x_inner_product_fwd_t::pp_kernel.0.bin

why dump bin is x8s8s32x not s8u8s32？what different in the two method？

jingjing__wang · ‎12-04-2019

hello , when I use cblas_gemm_s8u8s32 , I found the result is error when OP_B(Col Major, Unsigned int8)'s value is over 128。And ， I tested the efficiency of int8 GEMM（use cblas_gemm_s8u8s32） and float GEMM (use cblas_sgemm) on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?

qiang__zhang · ‎04-16-2020

Dear sir，

Could tell me why cblas_gemm_s8s8s32 is not support？ because AVX2 not support multiplying and adding vectors of the same type (either s8/s8 or u8/u8)？