BT, BTS and BTC instructions are fast again in Core 2. Could your compiler guys impleament those bit test intrinsics? I think BT instruction should be impleamented at least.
The intrinsics benifits are quite obvious. Suppose we want to test if bit i is set in an integerbitmap, we usually do this in C/C++:
if (bitmap & (1 << i))
The problems of the above C test are
1. more intructions genereated and
2. register cl is needed, thus increasingregister pressure. And moreregister swap/save instructions often neededbecause rcx/ecx is often used as an function parameter.
Another plus for _bit_test(integer, index) is that it reduces code size.
One additional suggestion to the compiler optimization:
Sometimes(not always) bitmap & (1 << i) should be compiledas a BT instruction.
Thank you for reply.
I think Itested the official latest version of IntelC++ Compiler 9.1 for Windows, which I downloaded last week.
What's a premier.intel.com account? I don't think I have one.
I just want to suggest your compiler developers to impleament such intrinsics that can improve Core 2 cpu performence.
Iassume you are a member of the C/C++ compiler team. I'm happy as long as anyone in your team knowsthis request.
I've tested a few things with ICL 9.1.It's a good compilerthat can beat the MS one in most cases.
However, I'm sure you can make it even better.
Tim, Thanks again!
I don't have such an account to file the request. I'm still an evaluation user. Could you please send this thread to them as a feature request? I think this feature would definitely boost Core 2 processors' SpecInt2000 or similar science benchmark results a little bit. And the impleamentation is not difficult at all if you consider the fact that they already impleamented _bit_scan_forword, _bit_scan_reverse and even a _popcnt!
When you getting the eval, it asks if you'd like the free support. If you select "Yes", you'll have an account with the PremierSupport at "http://www.intel.com/software/products/support". And you can submit issues or feature requests.
About the "bitmap & (1 << i) should be compiled as a BT instruction",it's a good one for our future compiler
About the _bit_test, will the following intrinsics work better for your case? If yes, I'll submit the feature for you.
int _bit_test(int val, int cnt); // returns either 0 or 1 the bit in val specified by cnt
int _bit_test_and_set(int *val, int cnt); // returns either 0 or 1 the bit in *val specified by cnt. That bit is then set.
Jenifer, Thanks a lot.
int _bit_test(int val, int bit_index) looks much better than the 2nd one that is microsoft syntax. _bit_test returns 0 if the specified bit is not set._bit_test returns non zero if the bit is set (not neccessarily to return 1 because the compilermay actually generate a conditional (CF) jumpwhenit is used in a condition clause).
_bit_test intrisincs should map to the BT instruction as closely as possible. I think you can safely assume that instrinsics users are at least assembly-aware programmers who know what they are doing. The MS version is quite ineffienct, whichgenerates a dummy memory read the last time I checkeda piece of 32 bit code MS 8.0 generated.
Each function has it's strengths and weaknesses. In a multi-threaded single processor system you would use the bit_test_and_set, in the SMP you would use the interlocked version of the intrinsic. Lacking this you would have to use a critical section or spinlock. Much more costly than using a memory temp.
I have to add that the new intrinsic "_bitttest" and others will be added later this year, but these intrinsics may not meet your requirements. The betterversion will be added after. It will take some more time. Again I'll post the news here.
Thanks Jennifer for the update and communications
I likeCore that is much better than Netburst:(
The current compilers are still carrying the tradition of avoiding certain intructions that are solw on P4.
Glad to hear we are going to get new intrinsics. Thanks again!
So you are suggesting that they also rename _mm_move_ps to _move_aligned_packed_single_precision_floating_point for consistency with those longer names?
I would rather introduce short names for _bit_scan_forward and _bit_scan_reverse, and leave the old ones as aliases for compatibility reasons. As you see consistency can be satisfied both ways.
I prefer the short names that are the same as asm conterparts with a leading underscore too. The big plus for short ones is that you can remember them easily because you already know the asm instructions. So, you havea really good idea:introduce short names for existing awkward long names like _bit_scan_forward and still keepconsistency. And overtime the long ones become deprecated.
_bsf makes more sense to me.