Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Amazing but unexpected behavior of a simple C language test

bostoniancarlos
Beginner
404 Views
Hello

Some time ago I made this simple C speed test:

#include
#include
#include

extern double sqrt(double);

void compute( void ){
long n;
double k;

for(n=1;n<750000000;++n){
k = n * 3;
k = sqrt(k) / 2;
}
printf("Test ended\\n");
return;
}

void main()
{
clock_t start_time, end_time;
double Time, S,E,Tics;

start_time = clock();

compute();
end_time = clock();
S = start_time;
E = end_time;
Tics = CLOCKS_PER_SEC;
Time = (E - S) / Tics;
printf( "Execution time was %f seconds\\n", Tiempo) ;
}

At that time I had the Pentium Dual-Core E5400 2.70GHz. The
time execution was 21.20 seconds. What amazed me was that
I ran it in four Linux (Ubuntu) terminals simultaneously
and the times where: 21.20, 21.17, 21.28 and 21.22 seconds.
Question ONE: Why are those times equals or almost equal to
the test I ran in just one terminal alone?

Recently I upgrade to an Intel Core i3-2120 3.30GHz and the
time in one terminal was sightly more: 23.39 seconds.
Question TWO: Why, if this microprocessor has a faster clock
and better technology.

Running this test in four terminals simultaneously give the
expected result to me, longer times because the OS has to
give slides of power to each terminal, about 34 seconds each one.
However, when I disabled the Hyper Threading feature from
the BIOS (the MB is an Asrock H67M-GE motherboard), the four
terminal test behave like the E5400, that is, all the times
around 23.39 seconds.
Question THREE: Did you expect this? Why?

Thank you very much in advance.

Regards,

Carlos



0 Kudos
7 Replies
SergeyKostrov
Valued Contributor II
404 Views
...At that time I had the Pentium Dual-Core E5400 2.70GHz. The
time execution was 21.20 seconds. What amazed me was that
I ran it in four Linux (Ubuntu) terminals simultaneously
and the times where: 21.20, 21.17, 21.28 and 21.22 seconds.
Question ONE: Why are those times equals or almost equal to
the test I ran in just one terminal alone?..

Could youcheck priorities of processes / threads for both testcases?

Best regards,
Sergey
0 Kudos
GHui
Novice
404 Views
HT technology not always speed up your program, especially your program is single thread. You ran four at one time, but I don't think it's multiple thread program.
0 Kudos
SergeyKostrov
Valued Contributor II
404 Views
Quoting GHui
HT technology not always speed up your program...

I think this is the mostconfusing part of HT programming. Some Intel HT docs state that it"...enables performance to be
significantly improved...". In reality sometimesthis is not happening.
0 Kudos
bostoniancarlos
Beginner
404 Views
Thank you Sergey

What I have found was process an option named "Nice"
then I seted it as -20 and he interface responded to
me tha it correspond to "Very high Priority", under
that I ran the test again but with not significant
difference (23,29 Seconds) notice that I am now
using the i3-2120 with the Hyper Threading feature
disabled from the BIOS (the MB is an Asrock H67M-GE
motherboard)

On the other hand the % of CPU load are:
One process: 49%
Four process: 24%, 23%, 24% and 23%

Regards,

Carlos

0 Kudos
bostoniancarlos
Beginner
404 Views
Rigth, that is what amazed me:
Why four instances of the same test,
give almost the same time that
the test in one intance WITHOUT
using Threads on any king of parallel
hardware. The i3.2120 i am using
has 2 cores but my Unbutu OS is
almost unaware of that. I spected
almost four times slower times or
in the best case 2 times slower times, in
this case i would have thought that
Ubuntu was using both cores. But how
explain almost the same time for one
test and for four simultaneus tests?

Regards,

Carlos
0 Kudos
Patrick_F_Intel1
Employee
404 Views
Hello Carlos,
The 'clock()' function returns the approximate cpu time in seconds for the process.
So, if you run on 1 cpu, the process takes x cpu-seconds.
If you run on any number of cpus, as long as you aren't using both cpus of a hyper-threaded core, you will get about x cpu-seconds.
The hyperthreaded cpus share the floating point unit so, since this process is floating point intensive (and runs in-cache), if you use both cpus of a hyper-threaded core, the process will take more cpu-seconds.
On my Sandybridge i7-2820QM CPU @ 2.30GHz, (with 4 cores, HT enabled so 8 total cpus) I get:

Processes time/process comments
1 9.5 secs no HT slowdown
2 9.5 secs each process on a core by itself
2 15.4 secs 1 process on each of the HT cpus of one core
4 9.5 secs each process on a core by itself
8 15.4 secs1 process on each of the HT cpus of all 4 cores
16 15.4 secsrunning on all 8 cpus with 2 processes per cpu.

Does this make sense?
Pat
0 Kudos
SergeyKostrov
Valued Contributor II
404 Views
...The hyperthreaded cpus share the floating point unit...

Thanks, Patrick! It really explains some performance issues.

Best regards,
Sergey
0 Kudos
Reply