Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL Itanium 2 Performance

tcrony70
Beginner
1,075 Views
I was looking at a variety of information and was trying to determine what the theoretical peak performance for an Itanium 2 (1.5 GHz) is.
From some information I found on various web sites, it would appear that for double precision floating point data, the peak would be 6 GFlops and 12 GFlops for single precision data (based on the 4 floating point units available for single precision). Is this correct?
However, there is a performance graph for DGEMM on one of the MKL pages which shows a performance of more than 6 GFlops (single CPU).
Can someone help clarify this?
Thanks,
Tim
0 Kudos
6 Replies
Todd_R_Intel
Employee
1,075 Views
I'll investigate the performance on our website. It may be an error (perhaps with the clock speed reported).
For both single and double precision, the theoretical peak of an Itanium 2 processor is 6 GFLOPS.
Todd
0 Kudos
tcrony70
Beginner
1,075 Views

Thanks for the response.

Does this mean that there are not 4 floating point units capable of doing single precision arithmetic as I saw in several places on the web or that only 2 can act in a any given cycle (since I would have to believe that they can do a fused multiply and add).

0 Kudos
TimP
Honored Contributor III
1,075 Views
Yes, that's based on retiring 2 fused multiply-add instructions per clock cycle.
0 Kudos
tcrony70
Beginner
1,075 Views
So what would be the point of having 4 floating point units that can handle short precision if one can still only retire 2 floating point instructions per clock cycle?
0 Kudos
TimP
Honored Contributor III
1,075 Views
You put Itanium 2 in the title. With Itanium 1, in principle, it was possible to execute SSE instructions. If your goal is to execute SSE code efficiently, surely Itanium 2 is not your vehicle.
0 Kudos
Intel_C_Intel
Employee
1,075 Views

The data you saw was erroneously labeled as being 1.5 GHz, but was, in fact 1.6 GHz, which explains why the performance exceeded 6 GFLOPS on DGEMM. The maximum performance in either single precision or double precision, as mentioned elsewhere, is 4 floating point operations per clock, or 2 FMAs per clock.

0 Kudos
Reply