Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
6956 Discussions

Clarification on signed/unsigned input for cblas_gemm_s8u8s32

guillaumekln
Beginner
1,641 Views

Hello,

I have a quick question on cblas_gemm_s8u8s32.

What is the reasoning behind requiring one side to be signed and the other unsigned?

The cuBLAS equivalent of this function, cublasGemmEx, expects both a and b to be signed which seems simpler to work with according to me.

Thanks,

Guillaume

0 Kudos
10 Replies
Jing_Xu
Employee
1,641 Views

It is because for most cases in image processing, weights are usually signed values, and elements of image are usually unsigned values.

0 Kudos
guillaumekln
Beginner
1,641 Views

Thank you for the reply.

That's interesting. I'm working on text application and all values are usually signed. Could we expect a fully signed interface in future releases?

0 Kudos
Jing_Xu
Employee
1,641 Views

I'll escalate this request to engineer team. They will make the decision.

0 Kudos
Jing_Xu
Employee
1,641 Views

Hi,

Could you try to use gemm_s16s16s32?

0 Kudos
guillaumekln
Beginner
1,641 Views

We are already using gemm_s16s16s32 with success but are interested in going further in terms of model compression and speed (the application is neural machine translation to be more precise).

If gemm_s8u8s32 is the only planned interface for 8 bits GEMM that's acceptable, we will try to adapt and implement device-specific quantization schemes.

(I also found out that google/gemmlowp requires both operands to be unsigned so there does not seem to be a standard way to provide 8 bits quantization: that's 3 libraries mentionned in this thread and 3 different interfaces!)

0 Kudos
Jing_Xu
Employee
1,641 Views

Hi,

For technical reasons, we only have s8u8s32 and s16s16s32 for integer gemm now.

0 Kudos
guillaumekln
Beginner
1,641 Views

For reference, a fully signed INT8 GEMM interface is available in MKL-DNN:

https://intel.github.io/mkl-dnn/group__c__api__blas.html#gac1869eab851b572350fb450c50c61626

But it looks like it does the computation in... double precision?

0 Kudos
jianqian__zhou
Beginner
1,641 Views

when i use QuantizedMatMulWithBias quantized matmul,mkldnn_verbose output is:

mkldnn_verbose,exec,inner_product,igemm_s8u8s32:blas,forward_inference,fsrc:nc fwei:io fbia:x fdst:nc,,mb768ic1024oc512,1.146

but mkldnn dump bin is:

mkldnn_dump_gemm_x8s8s32x_inner_product_fwd_t::pp_kernel.0.bin

why dump bin is x8s8s32x not s8u8s32?what different in the two method?

0 Kudos
jingjing__wang
Beginner
1,641 Views

hello , when I use cblas_gemm_s8u8s32 , I found the result is error when OP_B(Col Major, Unsigned int8)'s  value is over 128。And ,  I tested the efficiency of int8 GEMM(use cblas_gemm_s8u8s32) and float GEMM (use cblas_sgemm) on my machine and found that the speed of int8 GEMM is close to float. Why? Do you have the efficiency test results of two interfaces?

0 Kudos
qiang__zhang
Beginner
1,640 Views

Dear sir,

Could tell me why cblas_gemm_s8s8s32 is not support? because AVX2 not support multiplying and adding vectors of the same type (either s8/s8 or u8/u8)? 

0 Kudos
Reply