- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Instead of simply emiting bsfq icc generates following sequence:
bsf %r9d, %r10d #201.8
bsf %r8d, %r11d #201.8
addl $32, %r10d #201.8
testl %r8d, %r8d #201.8
cmove %r10d, %r11d #201.8
movslq %r11d, %r11
in following benchmark on i5 gcc is faster(200ms) than icc(270ms)
#include
int main(){
long i,sum;
sum=0;
for (i=1;i<100000000;i++){
sum+=__builtin_ctzl(sum+i);
}
printf("%llin",sum);
}
bsf %r9d, %r10d #201.8
bsf %r8d, %r11d #201.8
addl $32, %r10d #201.8
testl %r8d, %r8d #201.8
cmove %r10d, %r11d #201.8
movslq %r11d, %r11
in following benchmark on i5 gcc is faster(200ms) than icc(270ms)
#include
int main(){
long i,sum;
sum=0;
for (i=1;i<100000000;i++){
sum+=__builtin_ctzl(sum+i);
}
printf("%llin",sum);
}
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page