Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Optimizing this bit manipulation code

CommanderLake
Beginner
1,018 Views

So far I have only figured out how to do this with the .NET class BitArray, its virtually impossible to explain with words so here's the C# code:

for(int i = 0, j = 0, k = 0; i < bitArrayIn.Length; ++i, j += 16){
    if(j >= bitArrayIn.Length) j = ++k;
    bitArrayOut[i] = bitArrayIn[j];
}

and to reverse that operation:

var stride = bitArrayIn.Length/16;
for(int i = 0, j = 0, k = 0; i < bitArrayIn.Length; ++i, j += stride){
    if(j >= bitArrayIn.Length) j = ++k;
    bitArrayOut[i] = bitArrayIn[j];
}

If this makes sense to anyone my goal is to convert this to native code and optimize it is that's even possible but I can't get my head around how to do it or how to optimize it and make it faster.

0 Kudos
3 Replies
CommanderLake
Beginner
979 Views

Here it is in C++, can this be optimized? Are there intrinsics or any other methods of doing this that can help speed it up?:

typedef unsigned char T;
constexpr int TS = sizeof(T)*8;
bool Get(const T* a, const long long index){
	return (a[index/TS] & 1 << static_cast<long>(index % TS)) != 0;
}
void Set(T* a, const long long index, const bool value){
	if(value) a[index/TS] |= static_cast<unsigned char>(1 << static_cast<long>(index % TS));
	else a[index/TS] &= static_cast<unsigned char>(~(1 << static_cast<long>(index % TS)));
}
extern "C" _declspec(dllexport) void a(const T* a, T* b, const long long size, const long step){
	auto ii = 0LL;
	const auto bitn = size*8;
	for(long long i = 0, j = 0, k = 0; i < bitn; ++i, j += step){
		if(j >= bitn){
			j = ++k;
		}
		Set(b, ii++, Get(a, j));
	}
}

 

0 Kudos
CommanderLake
Beginner
968 Views

At least tell my why I'm not getting any replies I thought this forum was full of experts with this kind of stuff?

I'll try explaining further...

I want to separate the bits of a 16 bit integer array so the most significant bit of each int16 are arranged in one contiguous block followed by the next bit from each int16 in a block after that and so on, you might call this unpacking the bits? This is a sample of how the layout of the data would change:

0101010101010101 0101010101010101
would become
0011001100110011 0011001100110011
or with 3 it would look like
0101010101010101 0101010101010101 0101010101010101
which would become
0001110001110001 1100011100011100 0111000111000111
and so on

 Getting and setting individual bits is incredibly inefficient so I'm looking for ways to speed up this particular rearrangement of the bits.

0 Kudos
CommanderLake
Beginner
952 Views

Is anyone there? is this forum dead?

0 Kudos
Reply