- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I was looking at a variety of information and was trying to determine what the theoretical peak performance for an Itanium 2 (1.5 GHz) is.
From some information I found on various web sites, it would appear that for double precision floating point data, the peak would be 6 GFlops and 12 GFlops for single precision data (based on the 4 floating point units available for single precision). Is this correct?
However, there is a performance graph for DGEMM on one of the MKL pages which shows a performance of more than 6 GFlops (single CPU).
Can someone help clarify this?
Thanks,
Tim
Link copiado
6 Respostas
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I'll investigate the performance on our website. It may be an error (perhaps with the clock speed reported).
For both single and double precision, the theoretical peak of an Itanium 2 processor is 6 GFLOPS.
Todd
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thanks for the response.
Does this mean that there are not 4 floating point units capable of doing single precision arithmetic as I saw in several places on the web or that only 2 can act in a any given cycle (since I would have to believe that they can do a fused multiply and add).
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Yes, that's based on retiring 2 fused multiply-add instructions per clock cycle.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
So what would be the point of having 4 floating point units that can handle short precision if one can still only retire 2 floating point instructions per clock cycle?
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
You put Itanium 2 in the title. With Itanium 1, in principle, it was possible to execute SSE instructions. If your goal is to execute SSE code efficiently, surely Itanium 2 is not your vehicle.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
The data you saw was erroneously labeled as being 1.5 GHz, but was, in fact 1.6 GHz, which explains why the performance exceeded 6 GFLOPS on DGEMM. The maximum performance in either single precision or double precision, as mentioned elsewhere, is 4 floating point operations per clock, or 2 FMAs per clock.

Responder
Opções do tópico
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora