- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
i have a little problem. I tested the two functions vsSin and sinf because i wanted to know which of these two functions is the faster one.
here is my code :
Code:
float value; __int64 time1,time2,time3,time4; float a[10000]; float b[10000]; int n=10000; int mode; mode=VML_LA|VML_FLOAT_CONSISTENT|VML_ERRMODE_IGNORE; vmlSetMode(mode); for (int j=0;j<10000;j++) a= (float)(rand()%8); QueryPerformanceCounter((LARGE_INTEGER*)&time1); for (int i=0;i<10000;i++) value=sinf(a); QueryPerformanceCounter((LARGE_INTEGER*)&time2); QueryPerformanceCounter((LARGE_INTEGER*)&time3); vsSin(n,a,b); QueryPerformanceCounter((LARGE_INTEGER*)&time4); printf("time: %d ",time2-time1); printf("time: %d ",time4-time3);
and now the result
sinf(..) took 1608 ticks (or what ever QueryPerformanceCounter returns ;) )
vsSin(..) took 192344 ticks best
why is vsSin so slow???
Did i something wrong?
thanks for answers.
GoreProducers
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose the compiler may be able to replace your first loop by
value=sinf(a[9999]);
or may do nothing there, since you don't use value.
You could check (e.g. by saving .asm) to see whether that loop produces an svml library call or a single evaluation, if even that.
value=sinf(a[9999]);
or may do nothing there, since you don't use value.
You could check (e.g. by saving .asm) to see whether that loop produces an svml library call or a single evaluation, if even that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Compiler actually does eliminate "dead code" of sinf loop, because sinf results are used nowhere.
Look at the generated asm:
Look at the generated asm:
=============================================================
call DWORD PTR __imp__QueryPerformanceCounter@4
.B1.6:
lea eax, DWORD PTR [esp+16]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
.B1.7:
lea eax, DWORD PTR [esp+24]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
.B1.8:
lea edx, DWORD PTR [esp+40]
lea eax, DWORD PTR [esp+40040]
push eax
push edx
push 10000
call _vsSin
.B1.17:
add esp, 12
.B1.9:
lea eax, DWORD PTR [esp+32]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
=============================================================
call DWORD PTR __imp__QueryPerformanceCounter@4
.B1.6:
lea eax, DWORD PTR [esp+16]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
.B1.7:
lea eax, DWORD PTR [esp+24]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
.B1.8:
lea edx, DWORD PTR [esp+40]
lea eax, DWORD PTR [esp+40040]
push eax
push edx
push 10000
call _vsSin
.B1.17:
add esp, 12
.B1.9:
lea eax, DWORD PTR [esp+32]
push eax
call DWORD PTR __imp__QueryPerformanceCounter@4
=============================================================
As one can see there is no sinf loop between first two QueryPerformanceCounter calls.
To avoid such situation in future use one of two (or combination) methods:
1) compile your timing routine with optimization disabled - /Od compiler switch
2) emulate timed function results usage. For example, just print sinf values like:
To avoid such situation in future use one of two (or combination) methods:
1) compile your timing routine with optimization disabled - /Od compiler switch
2) emulate timed function results usage. For example, just print sinf values like:
======================================================
QueryPerformanceCounter((LARGE_INTEGER*)&time1);
for (int i=0;i b=sinf(a);
QueryPerformanceCounter((LARGE_INTEGER*)&time2);
QueryPerformanceCounter((LARGE_INTEGER*)&time1);
for (int i=0;i
QueryPerformanceCounter((LARGE_INTEGER*)&time2);
QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
for(i=0; i < n; i++)
{
printf("%f ", b);
}
======================================================
{
printf("%f ", b);
}
======================================================
By the way your timing results almost agree with actual VML performance (see vml notes).
Another one hint for accuracte timing - repeat your timing procedure several times (10-20).
And take the best result of them.
And take the best result of them.
======================================================
besttime = INT_MAX;
curtime = 0;
besttime = INT_MAX;
curtime = 0;
for(int repeat = 0; repeat < 15; repeat++)
{
QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
curtime = time4 - time3;
if(curtime < besttime)
besttime = curtime;
}
{
QueryPerformanceCounter((LARGE_INTEGER*)&time3);
vsSin(n,a,b);
QueryPerformanceCounter((LARGE_INTEGER*)&time4);
curtime = time4 - time3;
if(curtime < besttime)
besttime = curtime;
}
printf("time: %d
",besttime);
======================================================
======================================================
This hint will help you to avoid two issues - "cold cach
e" effect and operation system impact to performance measuring.
The best regards and good luck!
Andrey K.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page