The following code is a bare-bones simplification of a much larger project.It calculates the mean value of a large vector expression.
When the mean_calculator class contains the vector-expression itself (instead of a ref to it), which is morre expensive because of the copy-c/tor,
the first time it is ran the result is pretty random, while the subsequent calls give the same number (as far as I can tell).
When, though it contains a (const) ref to the vector-expression, anything goes as far a result is concerned.
The mean_calculator only operates on a part of the vector and, more importantly, only reads info.
Any idea for why this behavior ?
Thank you very much in advance,
PS: I know, that MKL has Summary Statistics, but this is not what I am after. This is only an example
that I need to understand.
jimdempseyatthecove wrote:Could that really make a difference when adding fewer than, say, ten final values (as an outlandishly high limit)?
No - final reduction is from lower magnitude to larger magnitude.
Petros Mamales wrote:Hmm, don't take chances: prevent or verify. There are different issues: integral aggregates can overflow, floating-point values can be lost if they're too small to matter against the current aggregate value. Not sorting might help with the first (assuming signed values), but only small-to-large sorting can benefit with the second.
This is a bit counter-intuitive to me. I would expect that if data are unsorted, then chances for overflow will be smaller. If on the other hand the data is sorted, then I am almost guaranteed for overflow.
Petros Mamales wrote:Jim's? :-)
Jim's suggestion to use, say a long double, for the partial aggregates seems very well suited for my needs.
Petros Mamales wrote:Yes, although I can't tell by how much: you'll have to tune the grainsize carefully to get close, and there's no Body reuse (might matter for fancy coding with multiple bins).
a) is the deterministic version of reduction slower, as a rule ?
Petros Mamales wrote:Have you observed such race issues (not the same as not allowing for Body reuse, which seems to be the case here)? Normally you don't copy the current aggregate value during the splitting constructor (initialising to a neutral value instead), so you don't have to do anything special there (a counterexample in the documentation might be useful). Using a local variable often helps the optimiser make dramatic improvements, but that's a different issue, and you don't need to start with a copy of the current value as long as you include it at the end.
b) when I write the operator body, if I do not copy the member element destination to a local variable, there are race issues - having to do with the splitting and the body of the function run concurrently. If on the other hand I do the local copying and in the end I assign back to the class member destination, things seem to work OK. In principle, I do not see any reason for this to be the case - am I missing something ? Is it that it is simply much more unlikely that the data race will occur, or something else is happening ? The tbb documentation does the same "trick", btw.