Community
cancel
Showing results for 
Search instead for 
Did you mean: 
neleai
Beginner
69 Views

icc emits slow __builtin_ctzl assembly

Instead of simply emiting bsfq icc generates following sequence:

bsf %r9d, %r10d #201.8
bsf %r8d, %r11d #201.8
addl $32, %r10d #201.8
testl %r8d, %r8d #201.8
cmove %r10d, %r11d #201.8
movslq %r11d, %r11

in following benchmark on i5 gcc is faster(200ms) than icc(270ms)

#include
int main(){
long i,sum;
sum=0;
for (i=1;i<100000000;i++){
sum+=__builtin_ctzl(sum+i);
}
printf("%llin",sum);
}
0 Kudos
0 Replies
Reply