Software Archive
Read-only legacy content
17061 Discussions

unexpected hStreams performance

Jianbin_F_
Beginner
857 Views

Hey guys, 

I am writing microbenchmarks with hStreams, but I met with some unexpected performance numbers. The observations are that, when increasing the number of partitions (or places per domain), the execution time doubles. Theoretically, I expect the performance should remain the same (or around). Could you help me with the issue? 

I attach the source code and the results. BTW, I am using hStreams v3.5.2.

Thanks,

Jianbin

0 Kudos
1 Solution
Wojciech_W_Intel
Employee
857 Views

I believe to have identified the rootcause of your problem.

Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking). Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.

If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).

I think a better way would be to call hStreams_GetCurrentOptions(&hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.

Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.

Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).

I hope this helps!

View solution in original post

0 Kudos
2 Replies
Wojciech_W_Intel
Employee
858 Views

I believe to have identified the rootcause of your problem.

Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking). Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.

If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).

I think a better way would be to call hStreams_GetCurrentOptions(&hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.

Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.

Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).

I hope this helps!

0 Kudos
Jianbin_F_
Beginner
857 Views

WOJCIECH W. (Intel) wrote:

I believe to have identified the rootcause of your problem.

Your benchmark code is setting custom hStreams options. Unfortunately, it does so by assigning only some of the fields of the struct. This instance of the C HSTR_OPTIONS struct is a static (in hStreams_helper.h) with no initializer which means that by rules of C++, all its members will be initialized to zero (broadly speaking). Specifically, its openmp_policy member will be zero which corresponds to the value of HSTR_OPENMP_ON_DEMAND (which implies no intiialization of OpenMP is to be done by hetero-streams). So, in your example, all the streams end up clobbering the same cores and bottomline - the actions execute sequentially.

If you do hstreams_options.openmp_policy = HSTR_OPENMP_PRE_SETUP in your initialization, you'll see that the results all oscillate around the same values (it's somewhere around 0.11 on my machine).

I think a better way would be to call hStreams_GetCurrentOptions(&hstreams_options, sizeof(hstreams_options)); before you do the assigning so that you get all the default hetero-streams' options.

Also, note that this benchmark will not be using all of the Xeon Phi if you set blocks = 2 and partitions = 8 since there aren't enough "divisions of work" to feed all the partitions.

Not directly related to your issue, but please also note that the signatures of the APP API functions changed between the release of hetero-streams that accompanied MPSS 3.5 and 3.6 (the order of the arguments was unified).

I hope this helps!

Thanks very much for the help! It solves the problem. 

0 Kudos
Reply