- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey guys,
I am writing microbenchmarks with hStreams, but I met with some unexpected performance numbers. The observations are that, when increasing the number of partitions (or places per domain), the execution time doubles. Theoretically, I expect the performance should remain the same (or around). Could you help me with the issue?
I attach the source code and the results. BTW, I am using hStreams v3.5.2.
Thanks,
Jianbin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe to have identified the rootcause of your problem.
Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking). Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.
If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).
I think a better way would be to call hStreams_GetCurrentOptions(&hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.
Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.
Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).
I hope this helps!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I believe to have identified the rootcause of your problem.
Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking). Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.
If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).
I think a better way would be to call hStreams_GetCurrentOptions(&hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.
Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.
Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).
I hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
WOJCIECH W. (Intel) wrote:
I believe to have identified the rootcause of your problem.
Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking). Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.
If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).
I think a better way would be to call hStreams_GetCurrentOptions(&hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.
Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.
Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).
I hope this helps!
Thanks very much for the help! It solves the problem.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page