- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was looking at a variety of information and was trying to determine what the theoretical peak performance for an Itanium 2 (1.5 GHz) is.
From some information I found on various web sites, it would appear that for double precision floating point data, the peak would be 6 GFlops and 12 GFlops for single precision data (based on the 4 floating point units available for single precision). Is this correct?
However, there is a performance graph for DGEMM on one of the MKL pages which shows a performance of more than 6 GFlops (single CPU).
Can someone help clarify this?
Thanks,
Tim
Link Copied
6 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll investigate the performance on our website. It may be an error (perhaps with the clock speed reported).
For both single and double precision, the theoretical peak of an Itanium 2 processor is 6 GFLOPS.
Todd
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response.
Does this mean that there are not 4 floating point units capable of doing single precision arithmetic as I saw in several places on the web or that only 2 can act in a any given cycle (since I would have to believe that they can do a fused multiply and add).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that's based on retiring 2 fused multiply-add instructions per clock cycle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So what would be the point of having 4 floating point units that can handle short precision if one can still only retire 2 floating point instructions per clock cycle?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You put Itanium 2 in the title. With Itanium 1, in principle, it was possible to execute SSE instructions. If your goal is to execute SSE code efficiently, surely Itanium 2 is not your vehicle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The data you saw was erroneously labeled as being 1.5 GHz, but was, in fact 1.6 GHz, which explains why the performance exceeded 6 GFLOPS on DGEMM. The maximum performance in either single precision or double precision, as mentioned elsewhere, is 4 floating point operations per clock, or 2 FMAs per clock.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page