I am new to TBB and have some doubts about how tasks are mapped to multiple CPUs. Is there any document which discusses design and architecture of TBB? (In particular how TBB manages OS threads). Are there any guidelines about tasks so that its easier for OS to map to multiple CPUs?
I was also reading about Fork/Join in Java (http://gee.cs.oswego.edu/dl/papers/fj.pdf).
Are there any significat similarities (or differences)between Fork/Join in java and TBB?
Yes, there is significant similarity between the mentioned Java Fork/Join framework and TBB, with regard to your question about task mapping. Both use Cilk-inspired work-stealing mechanisms to schedule lightweight objects (tasks) to underlying OS threads, and rely on OS to schedule the threads to CPU cores, and neither maps tasks directly to CPUs.
Due to undeterministic nature of work stealing, there is no way to ensure any warranty about a particular task being mapped to a particular CPU. Moreover, we consider such model non-scalable. We thinka programmer should rely on the library and OS to do mapping effectively for different kinds of HW, available both today and in the future. For cases where a programmer would like a more predictable scheduling based on past experience for better cache locality, TBB provides task-to-thread affinity mechanisms (e.g. affinity_partitioner for parallel loops) that provide the scheduler with advice for better mapping.
I hope this helps to answer your general question; and please follow up with any particular concern you might have.