Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

Efficient ways to count setted bits in bytes/words?

q_w_
Beginner
1,084 Views

I read the IA-32/64 developer's handbook, so far there isnt anything even at the machine code level, that can count the total numbers of setted bits for a given bytes or word without using loops, am I right about this? 

0 Kudos
9 Replies
SergeyKostrov
Valued Contributor II
1,084 Views
>>I read the IA-32/64 developer's handbook, so far there isnt anything even at the machine code level, that can count the total >>numbers of setted bits for a given bytes or word without using loops, am I right about this? If you need a very fast algorithm to count the total number of bits which are set to 1 in a byte or in a word two look-up tables have to be used: - 256 elemets for bytes - 65536 elements for words ( I assume that sizeof( word ) is 2 bytes )
0 Kudos
andysem
New Contributor III
1,084 Views

You can try using popcnt instruction, if supported by your target CPU.

0 Kudos
q_w_
Beginner
1,084 Views

Sergey Kostrov wrote:

>>I read the IA-32/64 developer's handbook, so far there isnt anything even at the machine code level, that can count the total
>>numbers of setted bits for a given bytes or word without using loops, am I right about this?

If you need a very fast algorithm to count the total number of bits which are set to 1 in a byte or in a word two look-up tables have to be used:

- 256 elemets for bytes
- 65536 elements for words ( I assume that sizeof( word ) is 2 bytes )

Very interesting way to do bit-wise operation, I will definitly give it a try, btw, have you benchmark this algorthim against traditional loop test?

0 Kudos
SergeyKostrov
Valued Contributor II
1,084 Views
>>...Very interesting way to do bit-wise operation, I will definitly give it a try, btw, have you benchmark this algorthim against >>traditional loop test? No. It will be faster because only a couple of CPU instructions will be needed ( after translation a C code to assembler ). Here is an example: ... __int8 iLUTNumOfBits[ 256 ] = { /* 0, 1, 2, 3, 4, 5, 6, ... , 254, 255 */ 0, 1, 2, 3, 1, 2, 3, ... , 7, 8 }; ... byte byVariable = 7; ... int iNumOfBits = iLUTNumOfBits[ byVariable ]; // ( 7 - 1 ) = 6 ( our index value ) -> 3 ( bits ) from LUT -> 00000111 = 0x07 ... Note: Take into account that just one Look-Up Table could be used.
0 Kudos
SergeyKostrov
Valued Contributor II
1,084 Views
>>...You can try using popcnt instruction, if supported by your target CPU... There are two intrinsic functions if you don't want to use assembler: POPCNT: int _mm_popcnt_u32( unsigned int a ); POPCNT: int64_t _mm_popcnt_u64( unsigned __int64 a ); Also, for bytes XLAT/XLATB Table Look-up Translation instruction.could be used and it is supported on all Intel CPUs.
0 Kudos
Christian_M_2
Beginner
1,084 Views

If you do this for words, you might also use a Look-up Table for one byte with 256 entries.

Then you go in the table with every byte of the word and sum the values up.

I am not sure, but in my opinion it might be faster than a larger table. The small table might be kept in the changes very near to the CPU. And selecting bytes of a word is a simple AND operation.

But this is only an assumption.

0 Kudos
SergeyKostrov
Valued Contributor II
1,084 Views
>>...Then you go in the table with every byte of the word and sum the values up. This is actually a good idea and you need to decide which way of counting is the best for your task.
0 Kudos
OTorg
New Contributor III
1,084 Views

Some more ways: en.wikipedia.org/wiki/Hamming_weight

It is easy to modify code shown there for 8,16,32 bit variables in a traditional c language and for 128, 256 bits using sse/avx registers/operations.

Which way (_mm_popcnt_uXX, byte/word-table lookup, simple arithmetics, arithmetics with multiple, etc) is faster (and, in general, is available) depends on your application and target system(s).

0 Kudos
SergeyKostrov
Valued Contributor II
1,084 Views
>>Which way (_mm_popcnt_uXX, byte/word-table lookup, simple arithmetics, arithmetics with multiple, etc) is faster (and, in general, >>is available) depends on your application and target system(s). This is absolutely right point of view. I regret to see that XLAT/XLATB instructions are almost forgotten. By the way, in the article about Hamming Weight there are two very interesting statements: >>... >>With unlimited memory, we could simply create a large lookup table of the Hamming weight of every 64 bit integer. >>... It looks like overcompilation of a really simple thing and the author didn't think about practical applications of that method. and >>... >>In C++ STL, the bit-array data structure bitset has a count() method that counts the number of bits that are set. >>... I tested it some time ago and it is very slow when compared to already mentioned methods.
0 Kudos
Reply