Hyperthreading on Sandy Bridge

mike_wilson · ‎10-31-2011

1. When hyperthreading is enabled on Sandy Bridge, are the entries in the ITLB, DTLB and STLB shared based on the dynamic activity of each thread or are they partitioned evenly so each thread gets exactly 50% of the entries?

2. Does hyperthreading on Sandy Bridge have any improvements compared to previous processor generations? In particular, does Sandy Bridge reduce the cases where enabling hyperthreading reduces performance?

3. Is there any operating system that lets an administrator enable or disable hyperthreading without going into the BIOS and rebooting the machine? Is there something about the hardware that makes this impossible?

TimP · ‎11-02-2011

SInce Nehalem, the ITLB is partitioned 50/50 statically under HT, DTLB and shared TLB are partioned on demand, not necessarily 50/50.
With the improved bandwidth of L1, there may be situations where HT performance is improved.
You should be able (as root) to unset the linux /proc entries of the sibling logicals so as to effectively shut off HT, as you could on earlier processors.

SergeyKostrov · ‎11-02-2011

Regardingsome parts ofquestion #3:

1. I think Microsoft enforces itfor all versions ofWindowswith Single-CPU license, right? I don't know if they call some API from BIOS or not.

2.Win32 API allows to enforce execution of allOS processes on one CPU.

3. Do you want to enforce you application to work all the time on one CPU? ( because of some synchronization problems, right? )

This is a piece of code that demostrates it in general for Windows:

...
#define RUN_ON_CPU_010x01
#define RUN_ON_CPU_020x02
...

...
bRc = ( RTBOOL )::GetProcessAffinityMask( hProcess, &dwProcessMask, &dwSystemMask );
if( bRc == RTFALSE )
break;
...

...
dwThreadAM = RUN_ON_CPU_01;// Main Thread willRunon 1st CPU
dwThreadAMPrev = ::SetThreadAffinityMask( hThread, dwThreadAM );
::Sleep( 0 );
...

PS: A callSleep( 0 ) is needed and I read about it in somedocuments.

mike_wilson · ‎11-03-2011

Sergey Kostrov wrote:
> Do you want to enforce your application to work all the time on one CPU? (because of some synchronization problems, right?)

Thanks for your reply. I am not trying to run on one CPU core. I want to test my software with and without hyperthreading to see which is faster. This will help me figure out how to modify my software so I gets the most benefit from hyperthreading. I don't want to have to go into the BIOS, toggle hyperthreading and reboot the machine every time I do this test. When hyperthreading is enabled and a core executes a HALT instruction, the core goes into single-task mode, which is equivalent to turning hyperthreading off for that core. The HALT instruction is a privileged instruction, so it can only be executed by the operating system. I want to find a way for 64-bit Windows 7 to execute a HALT instruction on one virtual core of each physical core. This would be equivalent to disabling hyperthreading. Does anyone know how to do that?

By the way, Windows 7 Professional, Enterprise and Ultimate allow you to use all the virtual cores in two processor sockets. The other versions of Windows 7 (Starter, Home Basic and Home Premium) allow you to use all the virtual cores in one processor socket. In other words, you can use hyperthreading with any version of Windows 7.

SergeyKostrov · ‎11-03-2011

...The HALT instruction is a privileged instruction, so it can only be executed by the operating system. I want to find a way for 64-bit Windows 7 to execute a HALT instruction on one virtual core of each physical core. This would be equivalent to disabling hyperthreading.Does anyone know how to do that?..

What about an idea of aVirtual Driver with some set of IOCTL commands? Like:

IOCTL_HYPERTHREADING_DISABLE

and thencalla Win32 function DeviceIControl( hDevice, IOCTL_HYPERTHREADING_DISABLE, ... )...

Best regards,
Sergey

jimdempseyatthecove · ‎11-04-2011

Sergey,

At your program start use GetProcessAffinityMask to get the logical processors available to your application.
Then bitwise AND the lpProcessAffinityMask with binary 0101010101010101010101010101010101010101010101010101010101010101
Then use SetProcessAffinityMask with the result.

Do this before you create additional threads.
This will restrict the threads of your process tothe low numbered HT siblings
This will not preventthe system or other applications from using those logical processors

You can also use SetThreadAffinityMask to the above mask (result ofAND above) on selected program threads, leaving the others (e.g. i/o thread or monitoring thread) to float.

Third option, for the alternate bitmask'dprocessors, create high priority threads that sit in an infinite loop

while(true)
_mm_pause();

I'd tryother options before inserting the _mm_pause() hack.

Jim Dempsey

SergeyKostrov · ‎11-04-2011

Thank you for your response. I'll trythe 3rd option. It looks very simple!
Best regards,
Sergey

mike_wilson · ‎11-05-2011

@TimP
When hyperthreading is enabled on Sandy Bridge and two threads are running in each physical core, are the physical register files (160 integer registers and 144 floating-point registers), load buffers, store buffers and line fill buffers shared based on the dynamic activity of each thread or are they statically partitioned so each thread gets exactly 50%?

@Sergey Kostrov
Your suggestion of writing a virtual device driver is a clever idea that I hadn't considered. If I execute a HALT instruction on one virtual core of each physical core, I don't know how I could prevent the operating system from restarting a virtual core to execute some background task for the operating system and then leave the core in multi-task mode. I'm not concerned about the time to execute the background task because that time is minimal. I'm concerned that the operating system will leave the core in multi-task mode instead of putting it back in single-task mode with another HALT instruction.

What I am really hoping is that Windows 7 has a control panel or registry entry that would let me check or uncheck a box to enable or disable hyperthreading.

@jimdempseyatthecove
I might have some misunderstanding but I don't think using SetProcessAffinityMask can be equivalent to disabling hyperthreading. It seems to me that setting the odd bits to zero in SetProcessAffinityMask will just cause one virtual core in each physical core to not be used. Disabling hyperthreading (or having each core in single-task mode) causes all resources in the cores that were statically partitioned between the two virtual cores to be recombined. Not using one virtual core in each physical core would be equivalent to disabling hyperthreading only if all resources in a physical core were shared between the two virtual cores based on the dynamic activity of the threads.

TimP · ‎11-05-2011

The load, store, and fill buffers (the latter being the most significant) are partitioned according to demand between the hyperthreads. Each logical has its own set of register files, with no sharing of any kind.

jimdempseyatthecove · ‎11-06-2011

Mike,

If you boot with HT enabled then any static partitioning (if any) will be set until reboot with HT disabled. HLT-ing a hardware thread will not disable HT (assuming you HLT from ring 0). Note, your system (unless embedded) will undoubtably have interrupts enabled, therefore and interrupt will resume from HLT* (*some of the newer processors have a way of not doing this for power saving, but also have a way to force continue, as to how this works I cannot say). The PAUSE (_mm_pause()) can be issued from any ring level. The stall loop (earlier thread) will at least mitigate some of the cach evictions, while not increasing L1 cache capacity (assuming it is not shared with sibling(s) in HT).

Before you ponder too long as to how to do this, it would be easire to run some experiments. I cannot imagine it would take you more than an hour to setup the runtime environment(s) and run tests.

a) with HT enabled, all threads used for "application"
b) with HT disabled, all threads used for "application"
c) with HT enabled, with process affinity set to even bits (of process bits), half threads used for "application"
d) with HT enabled
withthread affinity set to even bits (of process bits), half threads used for "application"
with other thread affinity set to odd bits (of process bits), half threads used for _mm_pause() loop

On SandyBridge you will be hard pressed to find a negative effect with HT enabled. If Kathy Farrel gets around to releasing the second half of "Have your cake and eat it too" in the Parallel Programming Blogs, you will see how I addressed this issue. The article addresses the SSE/AVX sharing issues with HT on/off as well as L1 instruction cache read, L1 data cach read, L1 data cache read/write, L1 data cache write. These variations combining SSE/AVX tasks with integer tasks.

The point of the blog was to illustrate that any potential gain with turning HT off is only experienced in a few applications. It took me 3-4 weeks to construct (choose) such an example. On SandyBridge, with Turbo Off, there was no difference (in SSE/AVX), with Turbo On there was ~1% improvement of the SSE/AVX code with HT off. However, the article goes on to show that by turning HT off, should your program have a combination of integer (non-SSE/AVX) and SSE/AVX, you cal loose between 20% to 40% of performance. Your application experience will vary from this.

To summarize:

The only benefit of turning HT off .AND. turning Trubo Boost off is in running scalability benchmarks as well as diagnosing performance issues with your code.

With HT on and Turbo Boost on, you will have a preponderance of net performance improvement, although your scalability charts will indicate less scalability.

Do you want to make your charts look good or do you want to make your system more productive?

Jim Dempsey

SergeyKostrov · ‎11-06-2011

I searched the web with a phrase "Disabling Hyper-Threading in software" and found a couple of links. Take a look, please:

http://solidlystated.com/software/how-to-disable-a-cpu-core

http://www.virtualdj.com/homepage/GargantulaKon/blogs/2691/How_To_Permanently_Disable_Hyper-Threading_For_Virtual_DJ.html

http://www.computing.net/answers/hardware/hyper-threading-disabled/35896.html

SergeyKostrov · ‎11-06-2011

...HLT-ing a hardware thread will not disable HT...

Here is adescriptionfrom Intel's "Instruction Set Reference. A-M. Vol 2A" ( Nov 2007 version ). Yes,it clearly says that it will be 'halted', but not 'disabled', and could be 'resumed':

mike_wilson · ‎11-06-2011

@TimP
Do you mean the physical register files that each thread uses are the same size regardless of whether hyperthreading is on or off? In other words, the physical register files are replicated for the two logical cores so half the space is unused when hyperthreading is off. Or do you mean the physical register files are partitioned 50/50 between the two logical cores? In this second case, I assume the amount of out-of-order execution in one thread would be reduced when hyperthreading is on.

Are you sure the load and store buffers on Sandy Bridge are shared between the two logical cores? According to the article below, on the Pentium 4, the load and store buffers were partitioned, not shared.
http://arstechnica.com/old/content/2002/10/hyperthreading.ars/4

Are there any other chip resources that are statically partitioned between the two logical cores on Sandy Bridge when hyperthreading is on?

@jimdempseyatthecove
You make some excellent points. I look forward to reading your blog article on hyperthreading. According to the article below, when hyperthreading is enabled and a HALT is executed in ring 0, the resources statically partitioned between the two logical cores are recombined. In other words, the HALT makes the core act as if hyperthreading is disabled.
"Hyper-Threading Technology Architecture and Microarchitecture"
section on Single-Task and Multi-Task Modes that starts near the bottom of page 9
Intel Technology Journal, Volume 6, Number 1
http://download.intel.com/technology/itj/2002/volume06issue01/art01_hyper/vol6iss1_art01.pdf#page=9

@Sergey Kostrov
Thanks for the links. I think msconfig (mentioned in your first link) is exactly what I'm looking for.

jimdempseyatthecove · ‎11-07-2011

Sergey,

Please re-read the description section for HLT.

HLT will stop the hardware thread (not core on HT) of the logical processor, this hardware thread may be resumedby way ofan interrupt.

This is not disabling HT within the processor.
Disabling HT within the processor occurs as aPower-On Configuration Option, configurable as a BIOS option.
The BIOS can be configured to assert or non-assert a signal into the processor at power-on time. This signal is used to disable (usually signal asserted) or enable (usually signal not asserted) HT.
This is not a programmable software option (other than for reprogramming BIOS and rebooting)

Therefore, IIF (if and only if) the processor partitions a resource, such as L1 ICache and/or L1 DCache with HT enabled, or not with non-HT configuration, the partitioning only occurs (or not occurs) at power on time.

Jim Dempsey

mike_wilson · ‎11-07-2011

I would like to try to explain in a more clear way what I was trying to say about the HALT instruction. When hyperthreading is enabled and a HALT instruction is executed in ring 0, the logical core that executed the HALT stops running and the physical core recombines the resources that were partitioned when hyperthreading was enabled. The remaining logical core then runs just as it would if hyperthreading is disabled. To directly quote page 9 of the article from the Intel Technology Journal that I mentioned in my previous post:

"For example, if logical processor 0 executes HALT, only logical processor 1 would be active; the physical processor would be in ST1-mode and partitioned resources would be recombined giving logical processor 1 full use of all processor resources."

By the way, the L1 caches are not partitioned 50/50 when hyperthreading is enabled and two logical cores are running on a physical core. The L1 caches are shared by the two logical cores based on their dynamic activity.

I wonder if Windows 7 is smart enough to notice when only one logical core is active on a physical core and execute a HALT instruction on the idle logical core. This would give the active logical core the benefit of recombined resources. In other words, the active logical core would run as if hyperthreading is disabled (called single-task mode) even though the BIOS option could have hyperthreading enabled.