<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic @bronxzv in Intel® ISA Extensions</title>
    <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967104#M4473</link>
    <description>&lt;P&gt;@bronxzv&lt;/P&gt;
&lt;P&gt;you were faster with your answer about the reciprocal throughput:) I wanted to write exactly the same answer:)&lt;/P&gt;
&lt;P&gt;Btw. afaik there are only two ports which are executing load/store instructions.&lt;/P&gt;</description>
    <pubDate>Tue, 21 May 2013 06:41:18 GMT</pubDate>
    <dc:creator>Bernard</dc:creator>
    <dc:date>2013-05-21T06:41:18Z</dc:date>
    <item>
      <title>Latency of a General purpose MOV instruction on Intel CPUs</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967097#M4466</link>
      <description>&lt;P&gt;Hi everybody,&lt;/P&gt;
&lt;P&gt;I'd like to&amp;nbsp;hear from &lt;STRONG&gt;Intel engineers&lt;/STRONG&gt; that&amp;nbsp;Latency of a &lt;STRONG&gt;General&lt;/STRONG&gt; purpose &lt;STRONG&gt;MOV&lt;/STRONG&gt; instruction on any&amp;nbsp;&lt;STRONG&gt;Intel&lt;/STRONG&gt; CPUs is &lt;STRONG&gt;1&lt;/STRONG&gt; clock cycle. For example, I've completed a set of tests for Intel(R) Pentium(R) 4 CPU 1.60GHz and my numbers are as follows:&lt;/P&gt;
&lt;P&gt;[ &lt;STRONG&gt;Intel C++ compiler - DEBUG&lt;/STRONG&gt; ]&lt;BR /&gt;...&lt;BR /&gt;Overhead of Assignment: &lt;STRONG&gt;1.091&lt;/STRONG&gt; clock cycles&lt;BR /&gt;...&lt;/P&gt;
&lt;P&gt;[ &lt;STRONG&gt;Intel C++ compiler - RELEASE&lt;/STRONG&gt; ]&lt;BR /&gt;...&lt;BR /&gt;Overhead of Assignment: &lt;STRONG&gt;1.191&lt;/STRONG&gt; clock cycles&lt;BR /&gt;...&lt;/P&gt;
&lt;P&gt;A&amp;nbsp;C code with assignment&amp;nbsp;looks like:&lt;/P&gt;
&lt;P&gt;unsigned __int64 uiClockCycles = __rdtsc();&lt;/P&gt;
&lt;P&gt;and a value returned from &lt;STRONG&gt;RDTSC&lt;/STRONG&gt; instruction&amp;nbsp;is assigned to &lt;STRONG&gt;uiClockCycles&lt;/STRONG&gt; variable with two&amp;nbsp;General purpose MOV instructions, and it means, that &lt;STRONG&gt;2&lt;/STRONG&gt; clock cycles will be actually&amp;nbsp;spent.&lt;/P&gt;
&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2013 04:03:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967097#M4466</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-20T04:03:26Z</dc:date>
    </item>
    <item>
      <title>I think that two mov</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967098#M4467</link>
      <description>&lt;P&gt;I think that two mov instructions are used to load high and low part of RDTSC value.&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2013 16:51:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967098#M4467</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-20T16:51:18Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;&gt;...and it means, that 2</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967099#M4468</link>
      <description>&amp;gt;&amp;gt;&amp;gt;&amp;gt;...and it means, that 2 clock cycles will be actually spent.
&amp;gt;&amp;gt;
&amp;gt;&amp;gt;...I think that two mov instructions are used to load high and low part...

I know this because a value returned from &lt;STRONG&gt;RDTSC&lt;/STRONG&gt; instruction is saved in &lt;STRONG&gt;EDX&lt;/STRONG&gt; and &lt;STRONG&gt;EAX&lt;/STRONG&gt; registers and in order to load it in a &lt;STRONG&gt;64-bit&lt;/STRONG&gt; variable two &lt;STRONG&gt;MOV&lt;/STRONG&gt; instructions are needed. I simply wanted to confirm that a General purpose &lt;STRONG&gt;MOV&lt;/STRONG&gt; instruction is &lt;STRONG&gt;always&lt;/STRONG&gt; executed in &lt;STRONG&gt;1&lt;/STRONG&gt; clock cycle on &lt;STRONG&gt;any&lt;/STRONG&gt; Intel CPU.</description>
      <pubDate>Mon, 20 May 2013 17:38:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967099#M4468</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-20T17:38:02Z</dc:date>
    </item>
    <item>
      <title>How large was loop counter</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967100#M4469</link>
      <description>&lt;P&gt;How large was loop counter needed to precisely measure latency of MOV instruction?And how many such a measurements did you average?&lt;/P&gt;</description>
      <pubDate>Mon, 20 May 2013 18:41:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967100#M4469</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-20T18:41:00Z</dc:date>
    </item>
    <item>
      <title>Here is a new update.</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967101#M4470</link>
      <description>Here is a new update.

&amp;gt;&amp;gt;...I'd like to hear from &lt;STRONG&gt;Intel engineers&lt;/STRONG&gt; that Latency of a General purpose MOV instruction on any Intel CPUs is 1 clock cycle...

Is that true?

Just completed another set of tests and I couldn't get &lt;STRONG&gt;1&lt;/STRONG&gt; clock cycle Latency for &lt;STRONG&gt;MOV&lt;/STRONG&gt; instruction on &lt;STRONG&gt;Ivy Bridge&lt;/STRONG&gt; system with Intel Core i7-3840QM ( 4 cores / 8 logical CPUs / ark.intel.com/compare/70846 )

Here are test results:

&lt;STRONG&gt;[ Intel C++ compiler ]&lt;/STRONG&gt;
...
Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction
Min Overhead of Assignment:      0.372 clock cycles
Final RDTSC Overhead Value:     23.628 clock cycles
...

&lt;STRONG&gt;[ Microsoft C++ compiler ]&lt;/STRONG&gt;
...
Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction
Min Overhead of Assignment:      0.381 clock cycles
Final RDTSC Overhead Value:     23.619 clock cycles
...

&lt;STRONG&gt;Note:&lt;/STRONG&gt; '...Overhead of Assignment...' means Latency of &lt;STRONG&gt;MOV&lt;/STRONG&gt; instruction and as you cn see on &lt;STRONG&gt;Ivy Bridge&lt;/STRONG&gt; system it is less than 1 clock cycle

These values &lt;STRONG&gt;0.372&lt;/STRONG&gt; and &lt;STRONG&gt;0.381&lt;/STRONG&gt; clock cycles are very consistent ( the same from test to test! ) for &lt;STRONG&gt;Intel&lt;/STRONG&gt; and &lt;STRONG&gt;Microsoft&lt;/STRONG&gt; C++ compilers.</description>
      <pubDate>Tue, 21 May 2013 00:57:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967101#M4470</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-21T00:57:03Z</dc:date>
    </item>
    <item>
      <title>On latest architecture memory</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967102#M4471</link>
      <description>&lt;P&gt;On latest architecture memory moves are executed by two Ports2 and 3 in parallel , but I do not know that this can explain such a low latency.&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2013 04:55:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967102#M4471</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-21T04:55:04Z</dc:date>
    </item>
    <item>
      <title>Quote:Sergey Kostrov wrote: I</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967103#M4472</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Sergey Kostrov wrote:&lt;BR /&gt; I'd like to&amp;nbsp;hear from &lt;STRONG&gt;Intel engineers&lt;/STRONG&gt; that&amp;nbsp;Latency of a &lt;STRONG&gt;General&lt;/STRONG&gt; purpose &lt;STRONG&gt;MOV&lt;/STRONG&gt; instruction on any&amp;nbsp;&lt;STRONG&gt;Intel&lt;/STRONG&gt; CPUs is &lt;STRONG&gt;1&lt;/STRONG&gt; clock cycle. &lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;you can find this information for specific implementations in the optimization manual [1] in appendix C.3 Latency and Throughput,&amp;nbsp;IIRC latency for MOV is 1 clock for all processors, now it looks like you are more after reciprocal throughput (since&amp;nbsp;you&amp;nbsp;issue&amp;nbsp;two&amp;nbsp;independent&amp;nbsp;MOV in&amp;nbsp;your example),&amp;nbsp;rcp throughput&amp;nbsp;is documented as 0.33 for Sandy Bridge/Ivy Bridge&amp;nbsp;for ex. (i.e. there&amp;nbsp;is 3 ports available for GPR&amp;nbsp;to&amp;nbsp;GPR moves)&amp;nbsp;but may be only 0.5 for older processors&lt;/P&gt;
&lt;P&gt;[1]&amp;nbsp;:&amp;nbsp;Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-026, April 2012 &lt;BR /&gt;available here: &lt;A href="http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html"&gt;http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2013 05:59:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967103#M4472</guid>
      <dc:creator>bronxzv</dc:creator>
      <dc:date>2013-05-21T05:59:00Z</dc:date>
    </item>
    <item>
      <title>@bronxzv</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967104#M4473</link>
      <description>&lt;P&gt;@bronxzv&lt;/P&gt;
&lt;P&gt;you were faster with your answer about the reciprocal throughput:) I wanted to write exactly the same answer:)&lt;/P&gt;
&lt;P&gt;Btw. afaik there are only two ports which are executing load/store instructions.&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2013 06:41:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967104#M4473</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-21T06:41:18Z</dc:date>
    </item>
    <item>
      <title>Quote:iliyapolak wrote: Btw.</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967105#M4474</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;iliyapolak wrote:&lt;BR /&gt; Btw. afaik there are only two ports which are executing load/store instructions.&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;load from memory isn't involved in the example at hand, 0.33 is for register to register moves (also for 64-bit MMX and 128-bit XMM registers), the store to memory&amp;nbsp;is not on the critical path in the example at hand (as it's usual for stores)&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2013 07:02:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967105#M4474</guid>
      <dc:creator>bronxzv</dc:creator>
      <dc:date>2013-05-21T07:02:00Z</dc:date>
    </item>
    <item>
      <title>thanks for correcting my</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967106#M4475</link>
      <description>&lt;P&gt;thanks for correcting my error.&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2013 07:09:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967106#M4475</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-21T07:09:48Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...load from memory isn't</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967107#M4476</link>
      <description>&amp;gt;&amp;gt;...load from memory isn't involved in the example at hand, 0.33 is for register to register moves...

The question was about the &lt;STRONG&gt;Latency&lt;/STRONG&gt; ( for any Intel CPU / unfortunately  Intel® 64 and IA-32 Architectures Optimization Reference Manual doesn't list all microarchitectures ) and &lt;STRONG&gt;Not&lt;/STRONG&gt; about the &lt;STRONG&gt;Throughput&lt;/STRONG&gt;.

However, I see that my current test perfectly measured the &lt;STRONG&gt;Throughput&lt;/STRONG&gt; of a General purpose &lt;STRONG&gt;MOV&lt;/STRONG&gt; instruction on &lt;STRONG&gt;Ivy Bridge&lt;/STRONG&gt; system. Here is a verification for 32-bit and 64--bit codes:

&lt;STRONG&gt;[ Intel C++ compiler - RELEASE - 32-bit ]&lt;/STRONG&gt;
...
Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction
Min Overhead of Assignment:      &lt;STRONG&gt;0.372&lt;/STRONG&gt; clock cycles
Final RDTSC Overhead Value:     23.628 clock cycles
...

&lt;STRONG&gt;[ Intel C++ compiler - RELEASE - 64-bit ]&lt;/STRONG&gt;
...
Test-Case 1.3 - Overhead of Assignment of a Value from RDTSC instruction
Min Overhead of Assignment:      &lt;STRONG&gt;0.369&lt;/STRONG&gt; clock cycles
Final RDTSC Overhead Value:     23.631 clock cycles
...

&lt;STRONG&gt;Note:&lt;/STRONG&gt; '...Min Overhead of Assignment...' needs to be changed to '...Min Throughput of Assignment...'</description>
      <pubDate>Tue, 21 May 2013 12:46:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967107#M4476</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-21T12:46:00Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;...[1] : Intel® 64 and IA</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967108#M4477</link>
      <description>&amp;gt;&amp;gt;...[1] : Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-026, April 2012...

I have that Manual and I saw the numbers for MOV instruction. Thanks.

Any comments from &lt;STRONG&gt;Intel engineers&lt;/STRONG&gt;?</description>
      <pubDate>Tue, 21 May 2013 12:51:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967108#M4477</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-21T12:51:29Z</dc:date>
    </item>
    <item>
      <title>Quote:Sergey Kostrov wrote:&gt;&gt;</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967109#M4478</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Sergey Kostrov wrote:&lt;BR /&gt;&amp;gt;&amp;gt;...[1] : Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-026, April 2012...&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have that Manual and I saw the numbers for MOV instruction. Thanks.&lt;/P&gt;
&lt;P&gt;Any comments from &lt;STRONG&gt;Intel engineers&lt;/STRONG&gt;?&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;as you can see at page C-31 of the optimization manual (written by Intel engineers) the latency&amp;nbsp;was 0.5 for Pentium 4 with the&amp;nbsp;double pumped "Fireball"&amp;nbsp;ALU (signature = 0F_2H) so the answer to your question is clearly &lt;STRONG&gt;no&lt;/STRONG&gt;, it isn't&amp;nbsp;1 clock cycle&amp;nbsp;for all Intel CPUs&lt;/P&gt;</description>
      <pubDate>Tue, 21 May 2013 13:27:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967109#M4478</guid>
      <dc:creator>bronxzv</dc:creator>
      <dc:date>2013-05-21T13:27:00Z</dc:date>
    </item>
    <item>
      <title>Guys, please pause for a</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967110#M4479</link>
      <description>Guys, please pause for a moment and let's wait for a comment from &lt;STRONG&gt;Intel engineers&lt;/STRONG&gt;. OK?</description>
      <pubDate>Tue, 21 May 2013 13:39:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967110#M4479</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-21T13:39:00Z</dc:date>
    </item>
    <item>
      <title>Sergey,</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967111#M4480</link>
      <description>&lt;P&gt;Sergey,&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;I have a suite of 3-4K tests .. which tell me all the instr late, more presice than anything found on the internet. &amp;nbsp;I get 1 clk on SB/IB for mov. &amp;nbsp;I also monitor the number eliminated, via move elimination and it appears they can eliminate only 1 move per dispatched set of ops.. I believe. &amp;nbsp;More food for thought on this.. but it's probably 1 clk.&lt;/P&gt;
&lt;P&gt;Perfwise&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2013 11:51:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967111#M4480</guid>
      <dc:creator>perfwise</dc:creator>
      <dc:date>2013-05-23T11:51:02Z</dc:date>
    </item>
    <item>
      <title>Here are two more quotes I</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967112#M4481</link>
      <description>Here are two more quotes I just found in Intel Manuals:

&lt;STRONG&gt;Intel(R) 64 and IA-32 Architectures Optimization Reference Manual&lt;/STRONG&gt;
Order Number: 248966-026
April 2012

&lt;STRONG&gt;C.3.1 Latency and Throughput with Register Operands&lt;/STRONG&gt;
...
Processor instruction timing data is implementation specific; it can vary between
model encodings within the same family encoding...
...

On Page &lt;STRONG&gt;738&lt;/STRONG&gt;</description>
      <pubDate>Thu, 23 May 2013 13:00:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967112#M4481</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-23T13:00:07Z</dc:date>
    </item>
    <item>
      <title>Latency:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967113#M4482</link>
      <description>&lt;STRONG&gt;Latency:&lt;/STRONG&gt;
	0F_3H - 1
	0F_2H - 0.5

&lt;STRONG&gt;Throughput:&lt;/STRONG&gt;
	0F_3H - 0.5
	0F_2H - 0.5

&lt;STRONG&gt;Notes:&lt;/STRONG&gt;
0F_3H - Intel Xeon Processor, Intel Xeon Processor MP, Intel Pentium 4, Pentium D processors
0F_2H - Intel Xeon Processor, Intel Xeon Processor MP, Intel Pentium 4 processors

&lt;STRONG&gt;Intel(R) 64 and IA-32 Architectures Software Developer’s Manual&lt;/STRONG&gt;
&lt;STRONG&gt;Volume 3 (3A, 3B &amp;amp; 3C): System Programming Guide&lt;/STRONG&gt;
Order Number: 325384-044US
August 2012

&lt;STRONG&gt;CHAPTER 35 MODEL-SPECIFIC REGISTERS (MSRS)&lt;/STRONG&gt;
...
&lt;STRONG&gt;Table 35-1&lt;/STRONG&gt;. CPUID Signature Values of DisplayFamily_DisplayModel
...
On Page &lt;STRONG&gt;1151&lt;/STRONG&gt;</description>
      <pubDate>Thu, 23 May 2013 13:02:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967113#M4482</guid>
      <dc:creator>SergeyKostrov</dc:creator>
      <dc:date>2013-05-23T13:02:07Z</dc:date>
    </item>
    <item>
      <title>&gt;&gt;&gt;0F_2H - 0.5&gt;&gt;&gt;</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967114#M4483</link>
      <description>&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;0F_2H - 0.5&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;So on this model encoding processor latency is 0,5 cycle.&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2013 16:30:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967114#M4483</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-23T16:30:12Z</dc:date>
    </item>
    <item>
      <title>Quote:iliyapolak wrote:</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967115#M4484</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;iliyapolak wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;gt;&amp;gt;&amp;gt;0F_2H - 0.5&amp;gt;&amp;gt;&amp;gt;&lt;/P&gt;
&lt;P&gt;So on this model encoding processor latency is 0,5 cycle.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;0F_2H&amp;nbsp;(family 15, model 2) is for the&amp;nbsp;P4&amp;nbsp;Northwood core&amp;nbsp;[2]&amp;nbsp;with its double pumped ALU,&amp;nbsp;AFAIK ALU latencies&amp;nbsp;were the same in the original P4 Willamette [1] with CPUID signature = 0F_1H (family 15, model 1)&lt;/P&gt;
&lt;P&gt;with the P4 Prescott [3] (0F_3H, i.e family 15, model 3) the double pumped "Fireball" ALU was replaced by a regular&amp;nbsp;ALU at core clock&amp;nbsp;thus latencies increased&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[1] &lt;A href="http://www.cpu-world.com/CPUs/Pentium_4/TYPE-Desktop%20Pentium%204%20Willamette.html"&gt;http://www.cpu-world.com/CPUs/Pentium_4/TYPE-Desktop%20Pentium%204%20Willamette.html&lt;/A&gt;&lt;BR /&gt;[2] &lt;A href="http://www.cpu-world.com/CPUs/Pentium_4/TYPE-Desktop%20Pentium%204%20Northwood.html"&gt;http://www.cpu-world.com/CPUs/Pentium_4/TYPE-Desktop%20Pentium%204%20Northwood.html&lt;/A&gt;&lt;BR /&gt;[3] &lt;A href="http://www.cpu-world.com/CPUs/Pentium_4/TYPE-Desktop%20Pentium%204%20Prescott.html"&gt;http://www.cpu-world.com/CPUs/Pentium_4/TYPE-Desktop%20Pentium%204%20Prescott.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 23 May 2013 17:58:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967115#M4484</guid>
      <dc:creator>bronxzv</dc:creator>
      <dc:date>2013-05-23T17:58:00Z</dc:date>
    </item>
    <item>
      <title>Yes it makes sense when</title>
      <link>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967116#M4485</link>
      <description>&lt;P&gt;Yes it makes sense when double-pumped ALU is taken into account.&lt;/P&gt;
&lt;P&gt;Thanks for interesting links.&lt;/P&gt;
&lt;P&gt;Btw. it is interesting how the designers of double pumped ALU were able to double the clock of this unit.I think that main reason was low transistor count needed to implement ALU&amp;nbsp; and thus lower heat disipation.&lt;/P&gt;</description>
      <pubDate>Fri, 24 May 2013 04:34:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-ISA-Extensions/Latency-of-a-General-purpose-MOV-instruction-on-Intel-CPUs/m-p/967116#M4485</guid>
      <dc:creator>Bernard</dc:creator>
      <dc:date>2013-05-24T04:34:00Z</dc:date>
    </item>
  </channel>
</rss>

