<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Optimizing this bit manipulation code in Software Tuning, Performance Optimization &amp; Platform Monitoring</title>
    <link>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1412916#M8103</link>
    <description>&lt;P&gt;Here it is in C++, can this be optimized? Are there intrinsics or any other methods of doing this that can help speed it up?:&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;typedef unsigned char T;
constexpr int TS = sizeof(T)*8;
bool Get(const T* a, const long long index){
	return (a[index/TS] &amp;amp; 1 &amp;lt;&amp;lt; static_cast&amp;lt;long&amp;gt;(index % TS)) != 0;
}
void Set(T* a, const long long index, const bool value){
	if(value) a[index/TS] |= static_cast&amp;lt;unsigned char&amp;gt;(1 &amp;lt;&amp;lt; static_cast&amp;lt;long&amp;gt;(index % TS));
	else a[index/TS] &amp;amp;= static_cast&amp;lt;unsigned char&amp;gt;(~(1 &amp;lt;&amp;lt; static_cast&amp;lt;long&amp;gt;(index % TS)));
}
extern "C" _declspec(dllexport) void a(const T* a, T* b, const long long size, const long step){
	auto ii = 0LL;
	const auto bitn = size*8;
	for(long long i = 0, j = 0, k = 0; i &amp;lt; bitn; ++i, j += step){
		if(j &amp;gt;= bitn){
			j = ++k;
		}
		Set(b, ii++, Get(a, j));
	}
}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 05 Sep 2022 23:33:08 GMT</pubDate>
    <dc:creator>CommanderLake</dc:creator>
    <dc:date>2022-09-05T23:33:08Z</dc:date>
    <item>
      <title>Optimizing this bit manipulation code</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1412582#M8102</link>
      <description>&lt;P&gt;So far I have only figured out how to do this with the .NET class BitArray, its virtually impossible to explain with words so here's the C# code:&lt;/P&gt;
&lt;LI-CODE lang="csharp"&gt;for(int i = 0, j = 0, k = 0; i &amp;lt; bitArrayIn.Length; ++i, j += 16){
    if(j &amp;gt;= bitArrayIn.Length) j = ++k;
    bitArrayOut[i] = bitArrayIn[j];
}&lt;/LI-CODE&gt;
&lt;P&gt;and to reverse that operation:&lt;/P&gt;
&lt;LI-CODE lang="csharp"&gt;var stride = bitArrayIn.Length/16;
for(int i = 0, j = 0, k = 0; i &amp;lt; bitArrayIn.Length; ++i, j += stride){
    if(j &amp;gt;= bitArrayIn.Length) j = ++k;
    bitArrayOut[i] = bitArrayIn[j];
}&lt;/LI-CODE&gt;
&lt;P&gt;If this makes sense to anyone my goal is to convert this to native code and optimize it is that's even possible but I can't get my head around how to do it or how to optimize it and make it faster.&lt;/P&gt;</description>
      <pubDate>Sat, 03 Sep 2022 17:04:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1412582#M8102</guid>
      <dc:creator>CommanderLake</dc:creator>
      <dc:date>2022-09-03T17:04:15Z</dc:date>
    </item>
    <item>
      <title>Re: Optimizing this bit manipulation code</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1412916#M8103</link>
      <description>&lt;P&gt;Here it is in C++, can this be optimized? Are there intrinsics or any other methods of doing this that can help speed it up?:&lt;/P&gt;
&lt;LI-CODE lang="cpp"&gt;typedef unsigned char T;
constexpr int TS = sizeof(T)*8;
bool Get(const T* a, const long long index){
	return (a[index/TS] &amp;amp; 1 &amp;lt;&amp;lt; static_cast&amp;lt;long&amp;gt;(index % TS)) != 0;
}
void Set(T* a, const long long index, const bool value){
	if(value) a[index/TS] |= static_cast&amp;lt;unsigned char&amp;gt;(1 &amp;lt;&amp;lt; static_cast&amp;lt;long&amp;gt;(index % TS));
	else a[index/TS] &amp;amp;= static_cast&amp;lt;unsigned char&amp;gt;(~(1 &amp;lt;&amp;lt; static_cast&amp;lt;long&amp;gt;(index % TS)));
}
extern "C" _declspec(dllexport) void a(const T* a, T* b, const long long size, const long step){
	auto ii = 0LL;
	const auto bitn = size*8;
	for(long long i = 0, j = 0, k = 0; i &amp;lt; bitn; ++i, j += step){
		if(j &amp;gt;= bitn){
			j = ++k;
		}
		Set(b, ii++, Get(a, j));
	}
}&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Sep 2022 23:33:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1412916#M8103</guid>
      <dc:creator>CommanderLake</dc:creator>
      <dc:date>2022-09-05T23:33:08Z</dc:date>
    </item>
    <item>
      <title>Re: Optimizing this bit manipulation code</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1413115#M8104</link>
      <description>&lt;P&gt;At least tell my why I'm not getting any replies I thought this forum was full of experts with this kind of stuff?&lt;/P&gt;
&lt;P&gt;I'll try explaining further...&lt;/P&gt;
&lt;P&gt;I want to separate the bits of a 16 bit integer array so the most significant bit of each int16 are arranged in one contiguous block followed by the next bit from each int16 in a block after that and so on, you might call this unpacking the bits? This is a sample of how the layout of the data would change:&lt;/P&gt;
&lt;PRE class="lang-cs s-code-block"&gt;&lt;CODE class="hljs language-csharp"&gt;&lt;SPAN class="hljs-number"&gt;0101010101010101&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;0101010101010101&lt;/SPAN&gt;
would become
&lt;SPAN class="hljs-number"&gt;0011001100110011&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;0011001100110011&lt;/SPAN&gt;
&lt;SPAN class="hljs-keyword"&gt;or&lt;/SPAN&gt; &lt;SPAN class="hljs-keyword"&gt;with&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;3&lt;/SPAN&gt; it would look like
&lt;SPAN class="hljs-number"&gt;0101010101010101&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;0101010101010101&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;0101010101010101&lt;/SPAN&gt;
which would become
&lt;SPAN class="hljs-number"&gt;0001110001110001&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;1100011100011100&lt;/SPAN&gt; &lt;SPAN class="hljs-number"&gt;0111000111000111&lt;BR /&gt;and so on&lt;/SPAN&gt;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;Getting and setting individual bits is incredibly inefficient so I'm looking for ways to speed up this particular rearrangement of the bits.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2022 20:53:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1413115#M8104</guid>
      <dc:creator>CommanderLake</dc:creator>
      <dc:date>2022-09-06T20:53:54Z</dc:date>
    </item>
    <item>
      <title>Re: Optimizing this bit manipulation code</title>
      <link>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1413349#M8105</link>
      <description>&lt;P&gt;Is anyone there? is this forum dead?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Sep 2022 20:38:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Tuning-Performance/Optimizing-this-bit-manipulation-code/m-p/1413349#M8105</guid>
      <dc:creator>CommanderLake</dc:creator>
      <dc:date>2022-09-07T20:38:42Z</dc:date>
    </item>
  </channel>
</rss>

