Documentation on Hyper-threading implementation at Nehalem
is there any official documentation describing the internals of Hyper-threading as implemented in the Nehalem microarchitecture? I am looking for something similar to the article on the first (P4) implementation published in ITJ 2002. As far as I can tell by comparing experimental results with dual-threaded workloads on the two HT implementations (P4 and Nehalem), the latest implementation must have been subjected under radical changes regarding the way threads share resources, which yields dramatically better performance (reaches up to 2x in some cases). So, I was wondering if these modifications are documented somewhere..