Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.
1696 Discussions

how long does it take for thread to start; thread pooling

Daniel_B_Intel2
Employee
7,746 Views
hi,

recently I participated a lecture dealing with Performance tips for programing in MS .NET environment.

one of the interesting thing I learned was that it takes nearly 1 sec (!!!!) from the command that creates a thread until the thread starts its work.

1) Is it really true? Did someone measured these figures?
2) Is it true only for MS .NET env., or almost for all operation systems?

As a solution for this problem the lecture suggested to use "thread pool", but he did not give us an example.

Can someone share his/her thoughts/code examples for thread pool use?

Thank you
Daniel


0 Kudos
25 Replies
ClayB
New Contributor I
1,432 Views
> Now the problem is that the first
> RDTSC was done on CPU0 and the second RDTSC is done
> on CPU1. The time stamp counters between processors
> are not synchronized, and therefore the timing
> measured will be in correct.

I understand the problem that you've outlined and have run up against this in similar situations with distributed systems. It seems to me that the difference in clocks can be much greater in systems that only share a network connection versus processors in the same box or even the physical and logical processors of an HT system. Do we have any idea just how far out of sync the time stamp counters can be? Can this difference be measured by clockticks? Stopwatches? Calendars?

-- clay
0 Kudos
bronx
Beginner
1,432 Views
> However, when using this performance monitor with
> RDTSC there is an issue in an SMP operating system

yes, very good point: it is not safe to use it with more than a single CPU. In this case the perf counter is arguably a better choice. btw my own policy is to use RDTSC only for profiling purposes, never production code due to numerous portability issues


> with either logical processors (HyperThreading) or

are you real sure that the TSC is not shared by the 2 logical CPUs with hyperthreading ?


> are not synchronized, and therefore the timing
> measured will be in correct.

yes, and the tester will have the wrong impression that the timings are very precise...

0 Kudos
Aaron_C_Intel
Employee
1,432 Views

> I understand the problem that you've outlined and
> have run up against this in similar situations with
> distributed systems. It seems to me that the
> difference in clocks can be much greater in systems
> that only share a network connection versus
> processors in the same box or even the physical and
> logical processors of an HT system. Do we have any
> idea just how far out of sync the time stamp counters
> can be? Can this difference be measured by
> clockticks? Stopwatches? Calendars?
>

On HT systems, I've seen differences of 1/2 second to a second difference in timings when I used RDTSC incorrectly and then when I used set thread affinity to lock the main thread to one CPU.

While not an accurate measurement of the clock differences, it gives a rough ballpark of the scale.

It all has to do with how the OS starts up the second processor (logical or physical) and when it does it in relation to the first processor.

0 Kudos
Aaron_C_Intel
Employee
1,432 Views
>
> > with either logical processors (HyperThreading) or
>
> are you real sure that the TSC is not shared by the 2
> logical CPUs with hyperthreading ?
>

I'm pretty positive that the TSC is replicated and not shared with logical CPU's. It makes performance monitoring much easier when it is replicated.

However, many of the other performance events that VTune use are shared (example: L2 cache miss).

0 Kudos
bronx
Beginner
1,432 Views
> I'm pretty positive that the TSC is replicated and
> not shared with logical CPU's. It makes performance

in this case it makes sense for each independent TSC to be incremented only when the corresponding hw thread is active, i.e. in LP0(LP1) mode TSC1(TSC0) is frozen ?

If it's indeed the case, we have potentially more discrepancy between 2 threads TS value than on a SMP system where both TSCs are incremented mostly in sync
0 Kudos
Reply