Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
82 Views

AVX Base and Turbo Frequencies on non E5 CPUs

Jump to solution

The AVX Base and Turbo Frequencies for the Xeon E5 v3 CPUs are well documented:

http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-adva...

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-up...

 

Do other Intel processors also have AVX frequency range that is different from the normal base/turbo frequency range?

 

Specifically, what about i7 processors that have AVX2 support? Can they sustain AVX2 FMA instructions at full throughput and stay in the nominal base frequency/turbo frequency range?

 

Apologies in advance if this is the wrong forum for this question. I also asked on Intel support forum, but got no response: https://communities.intel.com/thread/87851

0 Kudos

Accepted Solutions
Highlighted
Black Belt
82 Views

I have not tested this systematically, but so far I have not seen any indication that the maximum Turbo frequencies are different for code that uses 256-bit registers and code that does not use 256-bit registers on the Haswell "client" parts.   The average sustained frequency is likely to be lower when using 256-bit registers if the processor hits either its power limit or its thermal limit.

I should be able to test this shortly on a Core i7-4690HQ and on a Haswell-based Core i5.

"Dr. Bandwidth"

View solution in original post

0 Kudos
5 Replies
Highlighted
Beginner
82 Views

*bump*

Very good questio.  I'd love to know the answer as well.  Thanks, Mike

0 Kudos
Highlighted
Black Belt
83 Views

I have not tested this systematically, but so far I have not seen any indication that the maximum Turbo frequencies are different for code that uses 256-bit registers and code that does not use 256-bit registers on the Haswell "client" parts.   The average sustained frequency is likely to be lower when using 256-bit registers if the processor hits either its power limit or its thermal limit.

I should be able to test this shortly on a Core i7-4690HQ and on a Haswell-based Core i5.

"Dr. Bandwidth"

View solution in original post

0 Kudos
Highlighted
82 Views

Presumably when using 256-bit registers (full width) one gets the same amount of work done in fewer instructions than when not using 256-bit registers (full width). In this sense, less work == less watts == less heat == more time in Turbo

Additionally, when using the wider registers (full width), the operations are distributed over a wider area of silicon, and for a shorter time. This may affect the peak heat measured in localized positions within the core. As to if this affects the Turbo clock frequency, this would be a subject for investigation.

Jim Dempsey

0 Kudos
Highlighted
Black Belt
82 Views

I haven't been able to find any references about Haswell client turbo mode.  In earlier client CPUs, there weren't both power consumption and thermal limits.

On my Haswell i5-4200U, the original behavior where using all logicals even briefly would cut turbo boost for long intervals has changed. Allowing all threads to run no longer makes much reduction in performance.  I suspect this came about from a BIOS update.  I still have the situation where Intel OpenMP runs best with OMP_PLACES=cores and the corresponding OMP_NUM_THREADS, while libgomp doesn't implement OMP_PLACES and needs more threads, but less than the total number of logical processors.  Cilk(tm) Plus acts more like libgomp, in the absence of affinity.

I have tested SSE4 and AVX-128 vs. 256-bit AVX and AVX2 and generally see as much gain with the latter as could be expected (frequently as much as 40% more performance, even when comparing VS2012 vs. VS2015).  As John said, there's no evidence of the wider register usage cutting turbo boost in client CPUs.

Another apparent consequence of either a BIOS or OS update is that I haven't been able to bring up the BIOS setup menu.

0 Kudos
Highlighted
Black Belt
82 Views

From theoretical point of view by using wider physical register power dissipation per some area(mm^2) per some unit of time(1.0e-3 of sec) may be greater when compared to 128-bit physial registers, but on the other hand the time spent in some calculation can be less as Jim pointed out.

0 Kudos