HyperThreading performance issues on Haswell

TimP · ‎09-28-2014

I've noted some peculiarities about behavior of HyperThreading on Haswell CPUs, on which I haven't been able to find any discussion.

On my Ultrabook, I don't find any BIOS menu option to disable HyperThreading. In OpenMP benchmarks (Windows 8.1), I find that allowing num_threads to default to the number of logicals degrades performance of subsequent single threaded code, typically by 30%, in comparison with setting num_threads to number of cores and OMP_PROC_BIND=spread, when running benchmarks compiled by Intel C++ or Fortran. Also under Cilk(tm) Plus, cutting CILK_NWORKERS down to the number of physical cores seems to result in more consistent performance. When compiling with gfortran or g++, running the full number of threads doesn't degrade subsequent single thread performance, but there a delay (longer than the expected OpenMP blocking time) in returning to Windows command prompt when the benchmark completes.

On Ultrabook, as you will find by web searches, the BIOS menu can be reached only when starting from power off, not on a power on restart. No one has explained why.

On a new Xeon -v3 server, we found the BIOS menu option to disable HyperThreading, but it fails to boot with HyperThreading disabled. I suggested submission of an IPS issue by the person to whom the platform is registered, but haven't heard of that being done. Our workloads didn't attempt to take advantage of the vast number of logical processors available; as far as we could see, Windows 2012 R2 scheduler is able to distribute threads to separate cores.

VTune analysis reported that our workload incurred 25.9% remote memory accesses, presumably due to threads moving among CPUs, on a -v1 server (HT disabled). I would have liked to know whether HyperThreading would influence this. This figure lends credence to proposals that switching to an OS other than Windows could improve performance by 10% with affinity setting (e.g. taskset). As the VTune with server -v3 support remains under non-disclosure, we weren't able to run that analysis on the -v3, and the BIOS menu issue prevented a non-HT timing test.

Bernard · ‎09-29-2014

>>>On Ultrabook, as you will find by web searches>>>

Do you have Lenovo Yoga ultrabook.

TimP · ‎09-29-2014

iliyapolak wrote:

>>>On Ultrabook, as you will find by web searches>>>

Do you have Lenovo Yoga ultrabook.

Acer Aspire Ultrabook

Bernard · ‎09-29-2014

Did you try to check if there is newer verison of BIOS on the Acer website?

Is there any way to enable/disable HT programmatically from the kernel mode? I was not able to find any info about that.

Patrick_F_Intel1 · ‎09-29-2014

Hello Tim,

Can you try disabling Windows 'Fast Startup' feature and see if you can't then disable HT? I don't know if it will fix the problem but it might.

Do enable/disable Fast Startup (on my laptop anyway) do 1) win button+x, 2) select power_options, 3) left menu: 'choose what power buttons do', 4) then select 'change settings which are currently unavailable, 5) now the 'Shutdown Settings' option "Turn on Fast Startup" will be changeable at the bottom of the screen.

The Fast Startup seems to do almost a hibernate on shutdown. On my Lenovo Ultrabook, if Fast Startup is enabled and I disable HT then windows has to boot, then shutdown (without ever letting me login), then boot again. I had to disable windows Fast Startup on another dual-booted laptop in order to be able to read the windows partition from Linux.

Pat

TimP · ‎09-29-2014

I hadn't applied any BIOS or driver updates in recent months, assuming there would be automatic notifications (wrong assumption). However, updates didn't affect the lack of an HT disable option. Updates do appear to improve Synaptics device.

Disabling "fast startup" does make it possible to power down faster, but doesn't change the lack of any HT disable option in BIOS. Interesting to hear that some Ultrabook vendors do have an HT disable. My BIOS has an option to select "legacy" in place of UEFI, but the boot hangs in "legacy" mode.

Having found a way around this strange issue about HyperThreading, I get more consistent comparisons among compilers.

When I restrict the Intel builds to 2 threads running on separate cores, the only kernel in my test suite where gfortran is over 10% faster than ifort is the CASE (Fortran equivalent of C switch).

There are more cases where g++ out-performs ICL than the other way around. I take into account non-portabilities between Intel C++ and g++ such as the need to replace std::max for Intel by fmax with -ffinite-math-only for g++. Each compiler optimizes some cases of STL transform() which the other doesn't.

The recent major release of Intel compilers corrected cases where compiler generated fast_memcpy calls were degrading performance. A corollary is that plain C code is preferred over memcpy() for best performance.