We are working on a multi core packet processing application. The architecture is roughly as outlined in the routing application example found in the O Reilly book with some minor variations.
Here is a rough outline of the 2 pipeline architecture.
1. Raw packet processor pipeline
Raw packets --> (parallel pipeline stages) ---> State update commands (into concurrent_queue)
2. User application pipeline
State update commands (from concurrent_queue) ---> (parallel pipeline stages) ---> Updated application state
Now, we know beforehand that certain update commands are closely related. It would be highly beneficial if a command executed on the same cache that recently executed a related command.
Visually the command queue looks like this :
A1, B1, B2, C1, A2, B3, C2, A3, A4 ....
We know that the commands Ax access the same data structure andalso have a high temporaal locality of reference.So we would like them to be executed on the same core (cache).Therefore, we would like to hint this to the TBB scheduler.
How can we take advantage of this knowledge in the pipeline class ?
Is there any other way to do it (ie via raw TBB tasks) ?
On a related note, Intel has published an article (PDF) http://download.intel.com/technology/advanced_comm/31156601.pdf titled "Supra Linear Packet Processing with Snort on multicore". In that article, TBB is not used at all and instead a flow pinning technique is used with normal threads and CPU affinity.
Is this the recommended way to deal with this class of application ?
Thanks in advance,
"Is there any other way to do it (ie via raw TBB tasks) ?"
Please first give an indication about whether you could still integrate both pipelines into one, because then each data packet will get its own task to take it from start to finish as quickly as possible, which should provide good locality.
(Correction) I seem to have overlooked a transition from "filters move past stationary data" (the tbb::pipeline approach) to "data moves through filters" (not directly supported), so concatenating two pipelines into one would not be the appropriate solution. Maybe I should sit this one out, though: I've done some tinkering with pipeline, but I'm not sure I've obtained a positive outcome, and nobody yet seemed interested to test it.
Thanks for your reply.
There are two reasons why we are thinking in terms of two pipelines.
1. We dont want to hold on to the data packets for too long. Once the "state update" commands are created from the data packets by the various parallel filters - it is no longer needed. The filters in the second pipeline work only on the "state update" commands which have a different locality.
2. The second pipeline will occasionally block to evict some data structures to disk. During this pruning all the data structures will be locked with a fat lock. Currently, the dominant data structure is the google dense_hash map but will probably move to concurrent_hash_map. It is my understanding (probably incorrect) that blocking a pipeline stage might stall it. It is okay if the second pipeline blocks because nothing can be done anyway while the data structures are being pruned. The all important "state update" commands will be queued and will be picked up by the second pipeline when the locks clear.
Please let me know if you need more detail about the application. I will be happy to provide it.
Apologies for the empty reply. I pressed Save and did not realize it would post to the forum.
"It is my understanding (probably incorrect) that blocking a pipeline stage might stall it."
Blocking a serial filter lets new data back up after it, quickly stalling the pipeline, yes; blocking a parallel filter does not have that effect, but may, like any blocking in TBB, be detrimental to performance, because TBB's scheduler is not aware of any alternativescheduling opportunity.
I would use a concurrent_queue to get any data that must be saved to an independent tbb_thread, though, because you never know when one pipeline might steal work from another pipeline, causing them to get entangled, and while there has been some discussion about that, I have not seen any notification that this problem was solved. This problem occurs because TBB, conceived for finite jobs where fairness typically gets in the way of performance, is now being used for long-running jobs with at least global-progress concerns.