Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

MKL Itanium 2 Performance

tcrony70
초급자
1,090 조회수
I was looking at a variety of information and was trying to determine what the theoretical peak performance for an Itanium 2 (1.5 GHz) is.
From some information I found on various web sites, it would appear that for double precision floating point data, the peak would be 6 GFlops and 12 GFlops for single precision data (based on the 4 floating point units available for single precision). Is this correct?
However, there is a performance graph for DGEMM on one of the MKL pages which shows a performance of more than 6 GFlops (single CPU).
Can someone help clarify this?
Thanks,
Tim
0 포인트
6 응답
Todd_R_Intel
직원
1,090 조회수
I'll investigate the performance on our website. It may be an error (perhaps with the clock speed reported).
For both single and double precision, the theoretical peak of an Itanium 2 processor is 6 GFLOPS.
Todd
0 포인트
tcrony70
초급자
1,090 조회수

Thanks for the response.

Does this mean that there are not 4 floating point units capable of doing single precision arithmetic as I saw in several places on the web or that only 2 can act in a any given cycle (since I would have to believe that they can do a fused multiply and add).

0 포인트
TimP
명예로운 기여자 III
1,090 조회수
Yes, that's based on retiring 2 fused multiply-add instructions per clock cycle.
0 포인트
tcrony70
초급자
1,090 조회수
So what would be the point of having 4 floating point units that can handle short precision if one can still only retire 2 floating point instructions per clock cycle?
0 포인트
TimP
명예로운 기여자 III
1,090 조회수
You put Itanium 2 in the title. With Itanium 1, in principle, it was possible to execute SSE instructions. If your goal is to execute SSE code efficiently, surely Itanium 2 is not your vehicle.
0 포인트
Intel_C_Intel
직원
1,090 조회수

The data you saw was erroneously labeled as being 1.5 GHz, but was, in fact 1.6 GHz, which explains why the performance exceeded 6 GFLOPS on DGEMM. The maximum performance in either single precision or double precision, as mentioned elsewhere, is 4 floating point operations per clock, or 2 FMAs per clock.

0 포인트
응답