- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The AVX Base and Turbo Frequencies for the Xeon E5 v3 CPUs are well documented:
Do other Intel processors also have AVX frequency range that is different from the normal base/turbo frequency range?
Specifically, what about i7 processors that have AVX2 support? Can they sustain AVX2 FMA instructions at full throughput and stay in the nominal base frequency/turbo frequency range?
Apologies in advance if this is the wrong forum for this question. I also asked on Intel support forum, but got no response: https://communities.intel.com/thread/87851
- Tags:
- Intel® Advanced Vector Extensions (Intel® AVX)
- Intel® Streaming SIMD Extensions
- Parallel Computing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not tested this systematically, but so far I have not seen any indication that the maximum Turbo frequencies are different for code that uses 256-bit registers and code that does not use 256-bit registers on the Haswell "client" parts. The average sustained frequency is likely to be lower when using 256-bit registers if the processor hits either its power limit or its thermal limit.
I should be able to test this shortly on a Core i7-4690HQ and on a Haswell-based Core i5.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*bump*
Very good questio. I'd love to know the answer as well. Thanks, Mike
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have not tested this systematically, but so far I have not seen any indication that the maximum Turbo frequencies are different for code that uses 256-bit registers and code that does not use 256-bit registers on the Haswell "client" parts. The average sustained frequency is likely to be lower when using 256-bit registers if the processor hits either its power limit or its thermal limit.
I should be able to test this shortly on a Core i7-4690HQ and on a Haswell-based Core i5.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Presumably when using 256-bit registers (full width) one gets the same amount of work done in fewer instructions than when not using 256-bit registers (full width). In this sense, less work == less watts == less heat == more time in Turbo
Additionally, when using the wider registers (full width), the operations are distributed over a wider area of silicon, and for a shorter time. This may affect the peak heat measured in localized positions within the core. As to if this affects the Turbo clock frequency, this would be a subject for investigation.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't been able to find any references about Haswell client turbo mode. In earlier client CPUs, there weren't both power consumption and thermal limits.
On my Haswell i5-4200U, the original behavior where using all logicals even briefly would cut turbo boost for long intervals has changed. Allowing all threads to run no longer makes much reduction in performance. I suspect this came about from a BIOS update. I still have the situation where Intel OpenMP runs best with OMP_PLACES=cores and the corresponding OMP_NUM_THREADS, while libgomp doesn't implement OMP_PLACES and needs more threads, but less than the total number of logical processors. Cilk(tm) Plus acts more like libgomp, in the absence of affinity.
I have tested SSE4 and AVX-128 vs. 256-bit AVX and AVX2 and generally see as much gain with the latter as could be expected (frequently as much as 40% more performance, even when comparing VS2012 vs. VS2015). As John said, there's no evidence of the wider register usage cutting turbo boost in client CPUs.
Another apparent consequence of either a BIOS or OS update is that I haven't been able to bring up the BIOS setup menu.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From theoretical point of view by using wider physical register power dissipation per some area(mm^2) per some unit of time(1.0e-3 of sec) may be greater when compared to 128-bit physial registers, but on the other hand the time spent in some calculation can be less as Jim pointed out.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page