as I do not understand the idea behind TBB pipeline(i.e. advantage vs raw thread),
I am going to use vtune to see how does thecoderun (text_filter code sample from TBB).
so some questions in my mind.
1.I am going to write a same program with raw thread and concurrent queue, which tunning method should use,is that sampling?
2.whichEVENT I can use to measure? total time?cache miss ratio and others?
first off about pipeline.The conceptbasically is: pipeline isstack of stages (steps of processing each item, token), where each stage represents ansimple operation done on an item. To use pipeline one should define a stream (that's where pipeline gets it's items from) and then define all the stages and push them back into pipeline (order is important of course). As soon as all of this is done, one just invokes pipeline.run() and the process starts.
By using Threading Building Blocks pipeline one enables usage of all the powers of TBB: automatic thread scheduling and internal optimal task scheduling. This ensures your application to be as much parallel as possible, scaling and balancing the load on threads automatically.
If you wish to compare performance of a text_filter example(distributed with TBB) with an alternative implementation that uses native threads, then all you really need to do (or certainly as a first step) isto time both of them on the same machineprocessing thesame workload. Note thatcomparing performance on several different machines (2, 4, 8, 16-core computers) is recommended if you are not 100% sure which computers your customers will use. Changing wordloads also makes sense to establish border cases - how small the smallest workload can be and how application behaves processing a huge workload.
And then, if you're through with the steps described above and the timings you're seeing look really strange to you, that's when it's time to start Thread Profiler and VTune. These two tools applied to both implementations of the same algorithm will let you observe the level of parallelism and thread's load balancing (that's Thread Profiler's job) and compare where the two implementation spend most of their time and how well they work with memory, cache, etc. (that's VTune's job).