TBB 4.1 Update 3 stable release is available for download on our site - tbb41_20130314oss
Changes (w.r.t. Intel TBB 4.1 Update 2):
- Binary files for Android* applications were added to the Linux* OS package.
- Binary files for Windows Store* applications were added to the Windows* OS package.
- Exact exception propagation (exception_ptr) support on Linux OS is now turned on by default for GCC 4.4 and higher.
- Stopped implicit use of large memory pages by tbbmalloc (Linux-only). Now use of large pages must be explicitly enabled with scalable_allocation_mode() function or TBB_MALLOC_USE_HUGE_PAGES environment variable.
Community Preview Features:
- Extended class task_arena constructor and method initialize() to allow some concurrency to be reserved strictly for application threads.
- New methods terminate() and is_active() were added to class task_arena.
- Fixed initialization of hashing helper constant in the hash containers.
- Fixed possible stalls in concurrent invocations of task_arena::execute() when no worker thread is available to make progress.
- Fixed incorrect calculation of hardware concurrency in the presence of inactive processor groups, particularly on systems running Windows* 8 and Windows* Server 2012.
Open-source contributions integrated:
- The fix for the GUI examples on OS X* systems by Raf Schietekat.
- Moved some power-of-2 calculations to functions to improve readability by Raf Schietekat.
- C++11/Clang support improvements by arcata.
- ARM* platform isolation layer by Steve Capper, Leif Lindholm, Leo Lara (ARM).
I think that for ARM undefined __BYTE_ORDER__ should be disallowed (most likely to be __ORDER_LITTLE_ENDIAN__ anyway), __TBB_control_consistency_helper() probably has to be __TBB_acquire_consistency_helper() (what did research say? what were the test results?), __TBB_release_consistency_helper() can probably be just "dmb ishst", and the workaround for relaxed load is either superfluous (because the parameter is volatile) or will require all currently volatile reads to be transformed into proper atomics. ;-)
(Added) I also see use of wfe/sev and dsb (why is that used?) with locks (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka14041.html), so perhaps there should be a bespoke lock for this hardware. I don't quite understand the difference between dmb and dsb: supposedly dsb is heavier than dmb and therefore maybe needed for sequential consistency (__TBB_full_memory_fence(), like sync vs. lwsync on PowerPC), but the description also reminds me of isync on PowerPC (but without references to uses that would make it a good implementation for __TBB_control_consistency_helper() and also contradicted by its location in the lock code)?
(Added) The implementation for relaxed loads adds a barrier, and refers to http://infocenter.arm.com/help/topic/com.arm.doc.uan0004a as the reason. That document recommends that compiler writers implement volatile this way (with as an unintentional side effect the addition of acquire semantics), and perhaps all uses of __TBB_control_consistency_helper() are on volatile variables, and perhaps that's why by sheer lucky consequence no problems were encountered with just a compiler fence as the implementation for the helper even though ARM should normally need "real" help just like PowerPC? Here's a question: do test_task_priority.exe and test_task_enqueue.exe also reliably finish execution (in release mode in particular)?
(Added 2013-03-31) The comment for undefined __BYTE_ORDER__, taken from mac_ppc.h, does not apply to ARM, I think: endianness seems to be a global setting, so a runtime check should be safe, if needed at all.