<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SSE4 ? in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938248#M16376</link>
    <description>Ive been working with different SIMD sets: Intel, AltiVec, Equator, TI for video encoding applications. Below is just few intrinsic that are clearly missing, and could benefit multiple video encoding/decoding applications as well as other signal processing tasks.&lt;BR /&gt;&lt;BR /&gt;--Absolute value intrinsic. It seems that SSE4 will have it. Right ?&lt;BR /&gt;&lt;BR /&gt;--One critical missing SIMD command is 16-bit multiplication with rounding, probably with saturation and cut-off variants, like:&lt;BR /&gt;(A*B+32767)&amp;gt;&amp;gt;16&lt;BR /&gt;&lt;BR /&gt;in common case, command which can be applied for wider variety of applications:&lt;BR /&gt;	(A*B+C)&amp;gt;&amp;gt;R&lt;BR /&gt;&lt;BR /&gt;where A,B,C are short, intermediate result is 32-bit  is either cut-off or saturated to 16 bits. So 16-bit multiply-add/round is possible&lt;BR /&gt;&lt;BR /&gt;--- Sign application, useful for quantization in multiple algorithms:&lt;BR /&gt;	SignApply (A,B) =&amp;gt;  B&amp;lt;0 ? (-A): A&lt;BR /&gt;&lt;BR /&gt;Example:	&lt;BR /&gt;A    10    2    0     7&lt;BR /&gt;B     1   -10   10    0 &lt;BR /&gt;S    10   -2    0     7&lt;BR /&gt;&lt;BR /&gt;Sign application is quite critical. One example is&lt;BR /&gt;shift with sign :&lt;BR /&gt;	A/2^n&lt;BR /&gt;If a is negative, arithmetic shift could cause result to be -1 instead of zero:&lt;BR /&gt;	-1&amp;gt;&amp;gt;4=-1 &lt;BR /&gt;	SignApply(ABS(A)&amp;gt;&amp;gt;N,A)&lt;BR /&gt;&lt;BR /&gt;As you probably know many new compression algorithms are using arithmetic coding, such as H264, JPEG2000, as well as proprietary ones. It is manly sequential algorithms.&lt;BR /&gt;One thing that could benefit execution of such algorithm is conditional commands, like found in TI architecture. This will avoid pipeline flash:&lt;BR /&gt;	e.g.&lt;BR /&gt;	if (RAX) RBX++ as signle instruction.&lt;BR /&gt;Instruction set for efficient arithmetic coding requires further research, since it can be one of the main bottlenecks in modern codecs. Maybe some sort of recipical division can be useful.&lt;BR /&gt;&lt;BR /&gt;alex@streambox.com</description>
    <pubDate>Mon, 27 Sep 2004 07:06:12 GMT</pubDate>
    <dc:creator>dessa</dc:creator>
    <dc:date>2004-09-27T07:06:12Z</dc:date>
    <item>
      <title>SSE4 ?</title>
      <link>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938246#M16374</link>
      <description>There was post on one German site, about SSE4 mnemimonics in VS beta:&lt;BR /&gt;&lt;A href="http://translate.google.com/translate?hl=en&amp;amp;sl=de&amp;amp;u=http://www.heise.de/ct/04/20/022/&amp;amp;prev=/search%3Fq%3Dpsignw%26hl%3Den%26lr%3D%26ie%3DUTF-8%26sa%3DG%26edition%3Dus" target="_blank"&gt;http://translate.google.com/translate?hl=en&amp;amp;sl=de&amp;amp;u=http://www.heise.de/ct/04/20/022/&amp;amp;prev=/search%3Fq%3Dpsignw%26hl%3Den%26lr%3D%26ie%3DUTF-8%26sa%3DG%26edition%3Dus&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;Can anyone from Intel comment where is it true/upcoming, and when specs will be published?&lt;BR /&gt;&lt;BR /&gt;Also, is there a Research Group in Intel which investigates what SIMD instructions are missing  or could be benifitial for applications, where one could send/submit the suggestions?&lt;BR /&gt;&lt;BR /&gt;at.</description>
      <pubDate>Tue, 21 Sep 2004 14:13:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938246#M16374</guid>
      <dc:creator>dessa</dc:creator>
      <dc:date>2004-09-21T14:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: SSE4 ?</title>
      <link>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938247#M16375</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;P&gt;&lt;FONT face="Times New Roman" size="3"&gt;Dear Alex,&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Times New Roman" size="3"&gt;Many&lt;SPAN&gt;engineers and architects&lt;/SPAN&gt; at Intel read this forum (&lt;SPAN&gt;or at least know somebody that does :-)&lt;/SPAN&gt;, so if you would like to share suggestions for new SIMD extensions, you can simply post these here.&lt;BR /&gt;&lt;/FONT&gt;&lt;FONT face="Times New Roman" size="3"&gt;Aart&lt;/FONT&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 23 Sep 2004 00:14:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938247#M16375</guid>
      <dc:creator>Intel_C_Intel</dc:creator>
      <dc:date>2004-09-23T00:14:02Z</dc:date>
    </item>
    <item>
      <title>Re: SSE4 ?</title>
      <link>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938248#M16376</link>
      <description>Ive been working with different SIMD sets: Intel, AltiVec, Equator, TI for video encoding applications. Below is just few intrinsic that are clearly missing, and could benefit multiple video encoding/decoding applications as well as other signal processing tasks.&lt;BR /&gt;&lt;BR /&gt;--Absolute value intrinsic. It seems that SSE4 will have it. Right ?&lt;BR /&gt;&lt;BR /&gt;--One critical missing SIMD command is 16-bit multiplication with rounding, probably with saturation and cut-off variants, like:&lt;BR /&gt;(A*B+32767)&amp;gt;&amp;gt;16&lt;BR /&gt;&lt;BR /&gt;in common case, command which can be applied for wider variety of applications:&lt;BR /&gt;	(A*B+C)&amp;gt;&amp;gt;R&lt;BR /&gt;&lt;BR /&gt;where A,B,C are short, intermediate result is 32-bit  is either cut-off or saturated to 16 bits. So 16-bit multiply-add/round is possible&lt;BR /&gt;&lt;BR /&gt;--- Sign application, useful for quantization in multiple algorithms:&lt;BR /&gt;	SignApply (A,B) =&amp;gt;  B&amp;lt;0 ? (-A): A&lt;BR /&gt;&lt;BR /&gt;Example:	&lt;BR /&gt;A    10    2    0     7&lt;BR /&gt;B     1   -10   10    0 &lt;BR /&gt;S    10   -2    0     7&lt;BR /&gt;&lt;BR /&gt;Sign application is quite critical. One example is&lt;BR /&gt;shift with sign :&lt;BR /&gt;	A/2^n&lt;BR /&gt;If a is negative, arithmetic shift could cause result to be -1 instead of zero:&lt;BR /&gt;	-1&amp;gt;&amp;gt;4=-1 &lt;BR /&gt;	SignApply(ABS(A)&amp;gt;&amp;gt;N,A)&lt;BR /&gt;&lt;BR /&gt;As you probably know many new compression algorithms are using arithmetic coding, such as H264, JPEG2000, as well as proprietary ones. It is manly sequential algorithms.&lt;BR /&gt;One thing that could benefit execution of such algorithm is conditional commands, like found in TI architecture. This will avoid pipeline flash:&lt;BR /&gt;	e.g.&lt;BR /&gt;	if (RAX) RBX++ as signle instruction.&lt;BR /&gt;Instruction set for efficient arithmetic coding requires further research, since it can be one of the main bottlenecks in modern codecs. Maybe some sort of recipical division can be useful.&lt;BR /&gt;&lt;BR /&gt;alex@streambox.com</description>
      <pubDate>Mon, 27 Sep 2004 07:06:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/SSE4/m-p/938248#M16376</guid>
      <dc:creator>dessa</dc:creator>
      <dc:date>2004-09-27T07:06:12Z</dc:date>
    </item>
  </channel>
</rss>

