MKL Itanium 2 Performance

tcrony70 · ‎04-27-2006

I was looking at a variety of information and was trying to determine what the theoretical peak performance for an Itanium 2 (1.5 GHz) is.

From some information I found on various web sites, it would appear that for double precision floating point data, the peak would be 6 GFlops and 12 GFlops for single precision data (based on the 4 floating point units available for single precision). Is this correct?

However, there is a performance graph for DGEMM on one of the MKL pages which shows a performance of more than 6 GFlops (single CPU).

Can someone help clarify this?

Thanks,

Tim

Todd_R_Intel · ‎04-27-2006

I'll investigate the performance on our website. It may be an error (perhaps with the clock speed reported).

For both single and double precision, the theoretical peak of an Itanium 2 processor is 6 GFLOPS.

Todd

tcrony70 · ‎04-28-2006

Thanks for the response.

Does this mean that there are not 4 floating point units capable of doing single precision arithmetic as I saw in several places on the web or that only 2 can act in a any given cycle (since I would have to believe that they can do a fused multiply and add).

TimP · ‎04-28-2006

Yes, that's based on retiring 2 fused multiply-add instructions per clock cycle.

tcrony70 · ‎04-28-2006

So what would be the point of having 4 floating point units that can handle short precision if one can still only retire 2 floating point instructions per clock cycle?

TimP · ‎04-28-2006

You put Itanium 2 in the title. With Itanium 1, in principle, it was possible to execute SSE instructions. If your goal is to execute SSE code efficiently, surely Itanium 2 is not your vehicle.

Intel_C_Intel · ‎05-01-2006

The data you saw was erroneously labeled as being 1.5 GHz, but was, in fact 1.6 GHz, which explains why the performance exceeded 6 GFLOPS on DGEMM. The maximum performance in either single precision or double precision, as mentioned elsewhere, is 4 floating point operations per clock, or 2 FMAs per clock.