- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
I have a powerful HP comuter with Q9550 (Core 2 Quad CPU). It seems that there is only one MMX/SSE unit shared between all 4 cores.
The reason I think so is the following. I am running a simple program that usses SSE-2.
- Running 1 thread achieves 300MB/s.
- Running 2 threads achieves 150MB/s per thread.
- Running 4 threads achieves 75MB/s per thread.
My laptop with T7250 (Core 2 Duo CPU) exhibits the similar behavior.
Is it true that Core-2 CPUs contain only one MMX/SSE unit?
Thanks!
Enlace copiado
- « Anterior
-
- 1
- 2
- Siguiente »
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
It seems that there is no freely available info or clearly stated information about the implementation from high level POV of x87 unit.Afaik Scheduler logic is wired to execution ports and probably one of those ports( Port1?) is responsible for x87 uops.At register file they probably use different physical registers to hold temp results and constants.
Just guessing.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Hello....
I have one basic kind of question...
Does XMM registers say XMM0-XMM7 are per core
or
each core has its own bank of 8 registers for Intel Sandy Bridge Architecture ?
Thanks,
Chaitali
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Each logical core has its own set of architectural registers, including vector registers. It cannot be otherwise since concurrent threads would garble each other's state.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Chaitali C. wrote:
Does XMM registers say XMM0-XMM7 are per core
per thread (per logical processor in hardware, saved with the thread context XSTATES in software), btw this is XMM0-XMM15 in 64-bit mode
Chaitali C. wrote:
each core has its own bank of 8 registers for Intel Sandy Bridge Architecture ?
each Sandy Bridge core has 144 registers able to act as x87/MMX or XMM/YMM registers (more register file entries than for the architected state alone is required for temporary rename registers) for the two hardware thread contexts, a good source here: http://www.realworldtech.com/sandy-bridge/5/
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
As it was said by the other posters each CPU core has its own set of physical registers to whom architectural registers(software accessible) are "connected". High number of physical registers are used for register renaming, temporaries storage and probably also used to store decomposed floating point values.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
iliyapolak wrote:
and probably also used to store decomposed floating point values.
what are you meaning here ?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Sorry I made a mistake in my post. I meant component of various algorithms. For example Taylor approximation of sine constants would be probably kept in those registers.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
iliyapolak wrote:
I meant component of various algorithms. For example Taylor approximation of sine constants would be probably kept in those registers.
indeed, though in case of high register pressure load+op instructions will be almost as fast since the constants will be kept live in the L1D cache when used in inner loops
with AVX-512 broadcast load + op (aka "scalar memory mode") will be even more efffective for this usage though we miss timing comparisons at the moment
on a side note: strangely the Intel compiler use a lot more load+op for Knights Landing tagets than for Skylake Xeon targets when compiling the very same source code
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Do you mean physical register pressure or architectural registers pressure?
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
iliyapolak wrote:
Do you mean physical register pressure or architectural registers pressure?
I was meaning logical register pressure, i.e. it is generally a good idea to use load+op for constants for polynomial evaluation and to use the registers for temporary variables, particularly with 32-bit code and only 8 XMM/YMM logical registers
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Informe de contenido inapropiado
Yes I agree with you, it is wise to do it even for short few terms polynomial evaluation.

- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Página de impresión sencilla
- « Anterior
-
- 1
- 2
- Siguiente »