<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SIMD byte problems in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804700#M722</link>
    <description>I try to think words as bytes:&lt;BR /&gt;&lt;BR /&gt;W    O    R D&lt;BR /&gt;00000011 00000001&lt;BR /&gt;B Y T E    B Y T E &lt;BR /&gt;&lt;BR /&gt;if I sum word_A + word_B, it's the same of sum byte_A0 + byte_B0, byte_A1 + byte_B1 (or at least if I keep bytes less then 255)&lt;BR /&gt;&lt;BR /&gt;but the * and / sounds a bit harder by shifting, because there are not byte shift instructions, if I shift left that word:&lt;BR /&gt;&lt;BR /&gt;00000011 00000001 --&amp;gt; shift right 2 bits--&amp;gt; 00000000 11000000&lt;BR /&gt;&lt;BR /&gt;so the left byte is ok, but the right one is not..&lt;BR /&gt;</description>
    <pubDate>Thu, 10 Feb 2011 13:58:34 GMT</pubDate>
    <dc:creator>tom_r</dc:creator>
    <dc:date>2011-02-10T13:58:34Z</dc:date>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804694#M716</link>
      <description>hi everyone,&lt;BR /&gt;I'm going to try some simd byte manipulation, but i noticed that byte operations are missing..&lt;BR /&gt;I tried to do byte add/sub, by thinking them as word or doublewords, it works, but I don't think it's a good idea. What to do if I need this:&lt;BR /&gt;&lt;BR /&gt;new_byte = (byte * 200 - 50) for each of 16 bytes within a simd reg?&lt;BR /&gt;&lt;BR /&gt;I tried to map the bytes to words, but it's a waste of memory.. is there any other way?&lt;BR /&gt;&lt;BR /&gt;thanks,&lt;BR /&gt;Tom</description>
      <pubDate>Wed, 09 Feb 2011 21:48:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804694#M716</guid>
      <dc:creator>tom_r</dc:creator>
      <dc:date>2011-02-09T21:48:01Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804695#M717</link>
      <description>are byte signed? which values are possible for byte? -1, 0, 1? in case "no", how to resolve overflows?</description>
      <pubDate>Thu, 10 Feb 2011 07:40:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804695#M717</guid>
      <dc:creator>Ilnar</dc:creator>
      <dc:date>2011-02-10T07:40:02Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804696#M718</link>
      <description>unsigned char, from 0 to 255 (like stuff with rgb colors), is it possible manage bytes? about overflow, I use a formula which overflow never occurs:&lt;BR /&gt;&lt;BR /&gt;new_char = char * 20 / 150 + 40&lt;BR /&gt;&lt;BR /&gt;if char is 255, new_char is 74, so no prob with overflow.. &lt;BR /&gt;&lt;BR /&gt;thanks</description>
      <pubDate>Thu, 10 Feb 2011 08:18:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804696#M718</guid>
      <dc:creator>tom_r</dc:creator>
      <dc:date>2011-02-10T08:18:00Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804697#M719</link>
      <description>i have no knowledge/idea for implenetation using SIMD, sorry&lt;BR /&gt;I can suggest you to use precalculted transformation table for 256 elements and to unroll the cycle in order to decreasing branches&lt;BR /&gt;char table[256] = {40, 40, 40, 40, 40, 40, 40, 40, 41, 41, 41, 41, 41, 41, 41, 42, ...., 74};&lt;BR /&gt;if the formula is stable, it's preferable rather than multiplication and dividing</description>
      <pubDate>Thu, 10 Feb 2011 08:57:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804697#M719</guid>
      <dc:creator>Ilnar</dc:creator>
      <dc:date>2011-02-10T08:57:23Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804698#M720</link>
      <description>lookup table is a good idea, but i need to write simd for other reasons too..&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Feb 2011 10:24:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804698#M720</guid>
      <dc:creator>tom_r</dc:creator>
      <dc:date>2011-02-10T10:24:17Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804699#M721</link>
      <description>you could donesome simple operations using SIMD with packed 16 bytesusing SSE (+-, mul or div by shifting), and some harder operations by thinking about 8 bytes as words (mul and div)</description>
      <pubDate>Thu, 10 Feb 2011 11:51:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804699#M721</guid>
      <dc:creator>Ilnar</dc:creator>
      <dc:date>2011-02-10T11:51:32Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804700#M722</link>
      <description>I try to think words as bytes:&lt;BR /&gt;&lt;BR /&gt;W    O    R D&lt;BR /&gt;00000011 00000001&lt;BR /&gt;B Y T E    B Y T E &lt;BR /&gt;&lt;BR /&gt;if I sum word_A + word_B, it's the same of sum byte_A0 + byte_B0, byte_A1 + byte_B1 (or at least if I keep bytes less then 255)&lt;BR /&gt;&lt;BR /&gt;but the * and / sounds a bit harder by shifting, because there are not byte shift instructions, if I shift left that word:&lt;BR /&gt;&lt;BR /&gt;00000011 00000001 --&amp;gt; shift right 2 bits--&amp;gt; 00000000 11000000&lt;BR /&gt;&lt;BR /&gt;so the left byte is ok, but the right one is not..&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Feb 2011 13:58:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804700#M722</guid>
      <dc:creator>tom_r</dc:creator>
      <dc:date>2011-02-10T13:58:34Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804701#M723</link>
      <description>just shift by masks&lt;BR /&gt;rshift2mask = 00111111 00111111&lt;BR /&gt;00000011 00000001 --&amp;gt; shift right 2 bits--&amp;gt; 00000000 11000000&lt;BR /&gt;after applying mask by bitwise and:&lt;BR /&gt;00000000 00000000&lt;BR /&gt;you just need 8 masks for rshift and 8 for lshift</description>
      <pubDate>Thu, 10 Feb 2011 14:15:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804701#M723</guid>
      <dc:creator>Ilnar</dc:creator>
      <dc:date>2011-02-10T14:15:45Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804702#M724</link>
      <description>&amp;gt;&amp;gt;new_char = char * 20 / 150 + 40&lt;BR /&gt;&lt;BR /&gt;using unsigned char for arithmetic, the result of (char*20) cannot exceed 255&lt;BR /&gt;The result of (x&amp;lt;256) / 150 in unsigned char arithmetic can only be 0 or 1&lt;BR /&gt;Therefore your end result can only be a list of bytes containing 40 or 41&lt;BR /&gt;&lt;BR /&gt;While I won't write the code for you, the gist would be&lt;BR /&gt;&lt;BR /&gt;multiply the 16 bytes by 20&lt;BR /&gt;compare result against 16 bytes of 150 producing a mask&lt;BR /&gt;negate the mask&lt;BR /&gt;add 40 to all bytes in result.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;EDIT&lt;BR /&gt;&lt;BR /&gt;However, you state:&lt;BR /&gt;&lt;BR /&gt;&amp;gt;&amp;gt;if char is 255, new_char is 74, so no prob with overflow..&lt;BR /&gt;&lt;BR /&gt;Therefore the original problem statement should have been stated clearly&lt;BR /&gt;&lt;BR /&gt;new_char = (char)((int)char * 20 / 150 + 40)&lt;BR /&gt;&lt;BR /&gt;For this you would modify the above by first converting 8 uchars to 8 uints&lt;BR /&gt;then multiply uints by 20 to produce temps&lt;BR /&gt;zero results&lt;BR /&gt;loop on&lt;BR /&gt; compare temps against 150s to produce a mask&lt;BR /&gt; if maskall zeros exit&lt;BR /&gt; negate mask&lt;BR /&gt; add mask to results&lt;BR /&gt; subtract 150s from temps&lt;BR /&gt; and with mask&lt;BR /&gt;end loop&lt;BR /&gt;convert 16-bit results back to 8 chars (shuffle)&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Feb 2011 14:33:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804702#M724</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2011-02-10T14:33:23Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804703#M725</link>
      <description>yes, due to integer division stuff, for * and / it was supposed the byte to become int (or short) and then byte again, otherwise as you said, the result can only be 0 or 1 dividing by 150..&lt;BR /&gt;&lt;BR /&gt;by "mul bytes" you mean using word mult instruction? if I move 16 bytes to register, I need to convert all 16 bytes to integers by shuffling data, or you mean converting before moving to reg?&lt;BR /&gt;&lt;BR /&gt;thanks for your reply guys, I will try those solutions soon&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Feb 2011 15:50:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804703#M725</guid>
      <dc:creator>tom_r</dc:creator>
      <dc:date>2011-02-10T15:50:22Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804704#M726</link>
      <description>If you have SSE4.1 or later use&lt;BR /&gt;&lt;BR /&gt; __m128i _mm_cvtepu8_epi16 (__m128i a);&lt;BR /&gt;&lt;BR /&gt;This converts 8 uchars into 8 shorts&lt;BR /&gt;&lt;BR /&gt;If you have earlier version of sse use&lt;BR /&gt;&lt;BR /&gt; __m128i _mm_shuffle_epi8 (__m128i a, __m128i b);&lt;BR /&gt;&lt;BR /&gt;Then shuffle can be used afterwards to convert back from 16-bit to 8-bit.&lt;BR /&gt;&lt;BR /&gt;Properly constructed, you could load 16 bytes into SSE register then using shuffle, convert 8 of those to 16 bits, mung those 8, producing 8 results in SSE register, then convert the other 8 bytes to 16-bits, and mung those. IOW one 16-byte load, one 16-byte store (two passes to produce results).&lt;BR /&gt;&lt;BR /&gt;Jim Dempsey</description>
      <pubDate>Thu, 10 Feb 2011 19:00:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804704#M726</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2011-02-10T19:00:29Z</dc:date>
    </item>
    <item>
      <title>SIMD byte problems</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804705#M727</link>
      <description>you're right, I managed that way..&lt;BR /&gt;&lt;BR /&gt;thanks&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Feb 2011 20:00:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/SIMD-byte-problems/m-p/804705#M727</guid>
      <dc:creator>tom_r</dc:creator>
      <dc:date>2011-02-10T20:00:28Z</dc:date>
    </item>
  </channel>
</rss>

