At first I thought that this behaviour had to do with the initialization of the task scheduler (the first example does not explicitly initialize it), but I confirmed that there were indeed 2 threads created, as it should be. What I did next was to explicitly initialize the task scheduler, and there I noticed that real parallel execution was achieved when using 3 or more threads. I am not sure if this issue has been discussed before, but I wonder whether this is a normal behaviour. Other TBB algorithms or raw tasks work as expected on my dual-core system, i.e. they scale well, either initializing the scheduler with 2 threads or letting the library automatically do so.
You're correct, there is an issue here. Currently TBB tasks that process graph nodes are scheduled by task::enqueue method. In current TBB version the enqueued tasks can be processed by worker threads, but not by master threads.
By defaultnumber_of_worker_threads = num_of_cores - 1. One is left for master thread. So on a 2-core system there will be one master and one worker threads. And as per the limitation mentioned above, only one worker thread executes the graph functional nodes, that's why you see serial execution. When you initialize task_scheduler_init with 3 threads, there are one master and two worker threads. So the two workers can work in parallel as expected.
We plan to fix this problem in one of the next releases, hopefully TBB 3.0 update 8, follow news. The fix will make possible execution of enqueued tasks by master thread as well as by workers.