Make no mistake: Tasks are executed by Threads. The gain comes from a variety of reasons, as Dmitriy suggests, but at the core is that TBB employs a thread pool so it pays the cost of thread creation once and then reuses those threads to handle the Tasks that are scheduled. Not having to create and destroy a thread for each Task is a big part of why it's more efficient, but not the only reason. The unfair scheduler Dmitriy mentioned can also be a win when it schedules related Tasks that can take advantage of data already cached in the HW thread by previous tasks.
TBB is a higher level library than the low level thread support libraries such as pthreads and Win32 and COM threads upon which it depends. Writing efficient and correct threaded code is complex, which is why there are so few of us that can do it so far. TBB is an attempt to hide some of the complexity of threading inside a stable, correct, reliable and efficient library. If you look under the hood, you're going to see some of that complexity ;-)
For details on the Arena, you might look to my response in this Scheduler Internals thread.