- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Why is the performance for openmp so crummy in the comparison here? https://software.intel.com/en-us/articles/using-intel-mkl-and-intel-tbb-in-the-same-application
I would have expected it to be about the same as tbb. In two of the cases it's even slower than the single-threaded version.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
This originates from difference in paradigms:
- OpenMP (and algorithms developed with it) initially designed for a non-concurrent environment, where it suppose accessing all machine resources for a single parallel region, and there it is capable to show top performance. But, if a concurrency like nested threading happens, such nested calls know almost nothing about each other. Without pretentious special care, this result to an oversubscription (context switching, cache trashing), which leads to a drop in efficiency of each call and overall performance as well. For OpenMP it is more efficient calling these cases just one after another, or carefully partition the machine with affinity settings and adjust number of threads depending on expected timing for a call.
- 10 parallel calls to sequential MKL do not suffer from oversubscription, but just run with the time of the largest call. Threads completed calls earlier wait on the final barrier doing nothing.
- TBB designed to handle concurrent environment, and perform dynamic rebalancing with task stealing. It well suits a use case where it is hard to predict size of a call and timing when such call will occur. For the case, it avoids oversubscription and capable to rebalance workload, so that physical CPUs completed small tasks, could help with finishing large tasks. All this leads to superior performance for the concurrent case.
Best regards,
Alexander
Lien copié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Mkl openmp parallel isn't trivial to use effectively with nested parallelism and it looks like no effort was made to deal with it. Documented methods include mpi thread funneled. Tbb with appropriate affinity might well be an alternative for such problems, but they didn't explain details.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
This originates from difference in paradigms:
- OpenMP (and algorithms developed with it) initially designed for a non-concurrent environment, where it suppose accessing all machine resources for a single parallel region, and there it is capable to show top performance. But, if a concurrency like nested threading happens, such nested calls know almost nothing about each other. Without pretentious special care, this result to an oversubscription (context switching, cache trashing), which leads to a drop in efficiency of each call and overall performance as well. For OpenMP it is more efficient calling these cases just one after another, or carefully partition the machine with affinity settings and adjust number of threads depending on expected timing for a call.
- 10 parallel calls to sequential MKL do not suffer from oversubscription, but just run with the time of the largest call. Threads completed calls earlier wait on the final barrier doing nothing.
- TBB designed to handle concurrent environment, and perform dynamic rebalancing with task stealing. It well suits a use case where it is hard to predict size of a call and timing when such call will occur. For the case, it avoids oversubscription and capable to rebalance workload, so that physical CPUs completed small tasks, could help with finishing large tasks. All this leads to superior performance for the concurrent case.
Best regards,
Alexander
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Is there any data for how TBB MKL compares to OpenMP MKL in a non-concurrent environment? Are they about the same in that case?
If so, is there any reason to use OpenMP MKL (if TBB MKL is always as good or better)?
- S'abonner au fil RSS
- Marquer le sujet comme nouveau
- Marquer le sujet comme lu
- Placer ce Sujet en tête de liste pour l'utilisateur actuel
- Marquer
- S'abonner
- Page imprimable