Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
1066 Discussions

AVX Base and Turbo Frequencies on non E5 CPUs

Andrew_L_5
Beginner
632 Views

The AVX Base and Turbo Frequencies for the Xeon E5 v3 CPUs are well documented:

http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-adva...

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-up...

 

Do other Intel processors also have AVX frequency range that is different from the normal base/turbo frequency range?

 

Specifically, what about i7 processors that have AVX2 support? Can they sustain AVX2 FMA instructions at full throughput and stay in the nominal base frequency/turbo frequency range?

 

Apologies in advance if this is the wrong forum for this question. I also asked on Intel support forum, but got no response: https://communities.intel.com/thread/87851

0 Kudos
1 Solution
McCalpinJohn
Black Belt
633 Views

I have not tested this systematically, but so far I have not seen any indication that the maximum Turbo frequencies are different for code that uses 256-bit registers and code that does not use 256-bit registers on the Haswell "client" parts.   The average sustained frequency is likely to be lower when using 256-bit registers if the processor hits either its power limit or its thermal limit.

I should be able to test this shortly on a Core i7-4690HQ and on a Haswell-based Core i5.

View solution in original post

5 Replies
Michael_H_8
Beginner
633 Views

*bump*

Very good questio.  I'd love to know the answer as well.  Thanks, Mike

McCalpinJohn
Black Belt
634 Views

I have not tested this systematically, but so far I have not seen any indication that the maximum Turbo frequencies are different for code that uses 256-bit registers and code that does not use 256-bit registers on the Haswell "client" parts.   The average sustained frequency is likely to be lower when using 256-bit registers if the processor hits either its power limit or its thermal limit.

I should be able to test this shortly on a Core i7-4690HQ and on a Haswell-based Core i5.

jimdempseyatthecove
Black Belt
633 Views

Presumably when using 256-bit registers (full width) one gets the same amount of work done in fewer instructions than when not using 256-bit registers (full width). In this sense, less work == less watts == less heat == more time in Turbo

Additionally, when using the wider registers (full width), the operations are distributed over a wider area of silicon, and for a shorter time. This may affect the peak heat measured in localized positions within the core. As to if this affects the Turbo clock frequency, this would be a subject for investigation.

Jim Dempsey

TimP
Black Belt
633 Views

I haven't been able to find any references about Haswell client turbo mode.  In earlier client CPUs, there weren't both power consumption and thermal limits.

On my Haswell i5-4200U, the original behavior where using all logicals even briefly would cut turbo boost for long intervals has changed. Allowing all threads to run no longer makes much reduction in performance.  I suspect this came about from a BIOS update.  I still have the situation where Intel OpenMP runs best with OMP_PLACES=cores and the corresponding OMP_NUM_THREADS, while libgomp doesn't implement OMP_PLACES and needs more threads, but less than the total number of logical processors.  Cilk(tm) Plus acts more like libgomp, in the absence of affinity.

I have tested SSE4 and AVX-128 vs. 256-bit AVX and AVX2 and generally see as much gain with the latter as could be expected (frequently as much as 40% more performance, even when comparing VS2012 vs. VS2015).  As John said, there's no evidence of the wider register usage cutting turbo boost in client CPUs.

Another apparent consequence of either a BIOS or OS update is that I haven't been able to bring up the BIOS setup menu.

Bernard
Black Belt
633 Views

From theoretical point of view by using wider physical register power dissipation per some area(mm^2) per some unit of time(1.0e-3 of sec) may be greater when compared to 128-bit physial registers, but on the other hand the time spent in some calculation can be less as Jim pointed out.

Reply