GMP Performance of multiplication vs squaring?

fritzfranz · ‎06-09-2010

I did some performance tests with multiplication.
There is no dedicated operation for squaring, and multiplication does not seem to detect squaring situations (identical operands for multiplication)

It should be expected that a dedicated implementation of squaring provides a significant performance benefit.
(at least this is my experience with own implementations of grammar school multiplication, Karatsuba, FFT).

Questions:
- is there a hidden feature in MKL/GMP for optimized squaring?
- are there plans for future versions of MKL to improve squaring?

mecej4 · ‎06-09-2010

Sorry, Vladimir is correct -- I misunderstood the question and, therefore, my posting should be removed.

I do not know how to erase a posting, so I edited the posting, removed the irrelevant content and substituted this retraction.

Vladimir_Petrov__Int · ‎06-09-2010

Franz,

Thank you for your interest in our library and particularly in its multi-precision functionality.

In one of our future versions there will be a significant speed-up of multiplication, including the squaring case.

We do not encourage using any undocumented "hidden features" of MKL. If you feel that some feature is missing, please do not hesitate to submit a feature request through premier.intel.com.

Best regards,
Vladimir

Vladimir_Petrov__Int · ‎06-09-2010

Quoting mecej4

Please give some examples. Here is a counterexample (with default precision -- did you mean to exclude this case?) :

- collapse source view plain copy to clipboard print ?
subroutinesub(x,y)
real,intent(in)::x
real,intent(out)::y
y=x*x
return
endsubroutinesub

[fortran]subroutine sub(x,y)
real, intent(in) :: x
real, intent(out) :: y
y=x*x
return
end subroutine sub[/fortran]

compiles to

- collapse source view plain copy to clipboard print ?

moveax,DWORDPTR[4+esp]
movedx,DWORDPTR[8+esp]
movssxmm0,DWORDPTR[eax]
mulssxmm0,xmm0
movssDWORDPTR[edx],xmm0
ret

[bash]    mov       eax, DWORD PTR [4+esp]
    mov       edx, DWORD PTR [8+esp]
    movss     xmm0, DWORD PTR [eax]
    mulss     xmm0, xmm0
    movss     DWORD PTR [edx], xmm0
    ret[/bash]

and, to me, it seems clear that 'mulss xmm0,xmm0' captures the essence of 'square the operand'.

This "counterexample" is not quite relevant to the subject of this thread since the original question was about multi-precision integer arithmetic.

Best regards,

Vladimir