Community
cancel
Showing results for 
Search instead for 
Did you mean: 
guillaumekln
Beginner
257 Views

Clarification on signed/unsigned input for cblas_gemm_s8u8s32

Hello,

I have a quick question on cblas_gemm_s8u8s32.

What is the reasoning behind requiring one side to be signed and the other unsigned?

The cuBLAS equivalent of this function, cublasGemmEx, expects both a and b to be signed which seems simpler to work with according to me.

Thanks,

Guillaume

0 Kudos
10 Replies
Jing_X_Intel
Employee
257 Views

It is because for most cases in image processing, weights are usually signed values, and elements of image are usually unsigned values.

guillaumekln
Beginner
257 Views

Thank you for the reply.

That's interesting. I'm working on text application and all values are usually signed. Could we expect a fully signed interface in future releases?

Jing_X_Intel
Employee
257 Views

I'll escalate this request to engineer team. They will make the decision.

Jing_X_Intel
Employee
257 Views

Hi,

Could you try to use gemm_s16s16s32?

guillaumekln
Beginner
257 Views

We are already using gemm_s16s16s32 with success but are interested in going further in terms of model compression and speed (the application is neural machine translation to be more precise).

If gemm_s8u8s32 is the only planned interface for 8 bits GEMM that's acceptable, we will try to adapt and implement device-specific quantization schemes.

(I also found out that google/gemmlowp requires both operands to be unsigned so there does not seem to be a standard way to provide 8 bits quantization: that's 3 libraries mentionned in this thread and 3 different interfaces!)

Jing_X_Intel
Employee
257 Views

Hi,

For technical reasons, we only have s8u8s32 and s16s16s32 for integer gemm now.

guillaumekln
Beginner
257 Views

For reference, a fully signed INT8 GEMM interface is available in MKL-DNN:

https://intel.github.io/mkl-dnn/group__c__api__blas.html#gac1869eab851b572350fb450c50c61626

But it looks like it does the computation in... double precision?

jianqian__zhou
Beginner
257 Views

when i use QuantizedMatMulWithBias quantized matmul,mkldnn_verbose output is:

mkldnn_verbose,exec,inner_product,igemm_s8u8s32:blas,forward_inference,fsrc:nc fwei:io fbia:x fdst:nc,,mb768ic1024oc512,1.146

but mkldnn dump bin is:

mkldnn_dump_gemm_x8s8s32x_inner_product_fwd_t::pp_kernel.0.bin

why dump bin is x8s8s32x not s8u8s32?what different in the two method?

jingjing__wang
Beginner
257 Views

hello , when I use cblas_gemm_s8u8s32 , I found the result is error when OP_B(Col Major, Unsigned int8)'s  value is over 128。And ,  I tested the efficiency of int8 GEMM(use cblas_gemm_s8u8s32) and float GEMM (use cblas_sgemm) on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?

qiang__zhang
Beginner
256 Views

Dear sir,

Could tell me why cblas_gemm_s8s8s32 is not support? because AVX2 not support multiplying and adding vectors of the same type (either s8/s8 or u8/u8)?