Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

TBB in Cython



In order to write efficient code, we use a lot of Cython for our library. To make things parallel, one can use prange that under the hoods use OpenMP. I would like to know if any access to TBB from within Cython is planned in the Intel Python distribution?

Best regards,


0 Kudos
4 Replies

There is no TBB module to call from inside of Cython as of yet, Alternatively, you can override OpenMP by calling TBB in the command line when running the Python script that uses the complied code.

python -m TBB

See here for more details on available options and flags on TBB.

0 Kudos

Please note that `python -m tbb` helps only to switch MKL (Numpy) and Python threads on top of the TBB task scheduler. However, it does not help in the case of Cython's parallel module. This feature was always on my radar but I did not find enough justification since the packages we shipped did not use any parallelism from Cython. If there are enough users for this feature, we can start working on bringing TBB and composable threading in general to Cython. If you think Cython on top of TBB makes sense to you and you can describe the use case and possibly measure the potential gain, please write to me or to scripting at with this feature request.


0 Kudos

Since OpenBLAS has recently added the ability to delegate thread scheduling to TBB in it might make sense to consider investing effort for this again now than more components from the scientific Python stack could be made to use TBB by default, even on CPU architecture not supported by MKL.

Projects like scikit-learn that both rely on Cython prange parallelism and BLAS level parallelism and Python level threading parallelism (especially via the future nogil CPython 3.13) might really benefit from this to mitigate oversubscription problems caused by thread-based nested parallelism.

Note that other projects from the PyData stack such as xgboost and lightgbm directly use OpenMP in their C++ source code. I don't know if such code can be made to delegate threading to TBB without a significant rewrite those libraries.

0 Kudos

Hello @ogrisel,

   Our engineering team has already engaged on some issues related to this OpenBLAS pull request, Deadlock issue in OpenBLAS with TBB #1336, Facing Deadlock issue with nested TBB #1316, also described in OpenBLAS issue #4418

   Regarding, Cython prange parallelism, this old article, Thread Parallelism in Cython*, can still be of interest, although it is based on OpenMP examples.

  Overall, I agree that it is worth talking about the composability provided by oneTBB again. If you have any ideas, specific examples, or samples that can showcase this functionality, please share them here (or if you would like them to be discussed).  Also, the composability bench distributed with Intel Python can be used for the experiments as well. 



0 Kudos