CPI = 0.80759
Parallelization_ratio = 0.91599
Modified_data_sharing_ratio = 0.00087244
L2_cache_miss = 324000
Branch_misprediction_ratio = 0.0077442
Bus_utilization_ratio = 0.18217CPI = 0.78496
Parallelization_ratio = 0.99238
Modified_data_sharing_ratio = 0.00085696
L2_cache_miss = 1089000
Branch_misprediction_ratio = 0.0073767
Bus_utilization_ratio = 0.2105Link Copied
lets try to clarify the rule of assigning of iterations in the guided scheduling:
P1 = C*K/(num_threads - 1);
P2 = C*(K-P1)/(num_threads - 1);
P3 = C*(K-P1-P2)/(num_threads - 1);
...
Pn = C*(K-P(n-1)...-P1)/(num_threads - 1);
... while Pn > chunksize
P(n+1) = chunksize
...
P(n+m-1) = chunksize
P(n+m) = rest iters <= chunksize
where C <= 1
in case default chunksize=1
for K=1200, C=0.5 we have about 81 schedules, 14,6 iterations in average
for K=2000, C=0.5 we have about 89 schedules, 22,5 iterations in average -- more work per thread, that's because you have better ratio for K=2000
in case chunksize=20
for K=1200, C=0.5 we have about34 schedules, 35,3 iterations in average
for K=2000, C=0.5 we have about41 schedules,48,8 iterations in average
For more complete information about compiler optimizations, see our Optimization Notice.