- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

what are the performance implications of casting floats to doubles? We have a tight inner loop wherein most of our work is being done, and in this loop we are accumulating results. Right now our accumulation variable is type float, but out of fear that we may overflow somehow or lose too much precision we decided to switch the accumulation var to double and cast the floating point results to double before adding to the accumulation var. To my amazement the execution time tripled. Is there a faster way to do this? Thanks in advance.

Link Copied

9 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Assume the tight loop is accumulating results from 1,000,000 iterations.

Assume your input data is of format n.nnn0000

After 1,000 iterations your accumulator might contain n,nnn.nnn (7 digits of data)

If you continue to add into this accumulator you will, through round-off, be dropping the last significant digit(s) in the source data. However, if you were to:

on each aniversary of 1000 iteratons (and after last iteration)

{

convert accumulator to double and sum into grand total accumulator

then zero out low precision accumulator and resume accumulation

}

Thenan almost equivilentdouble precision result will be produced at a very low additional overhead.

The frequency of the grand total would depend on the values in the input data.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

That's a very clever trick, I hadn't thought of that! Thank you, I'll try it.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

icc vectorized float accumulation already uses 4 independent partial sums, so there is a little more protection than in a single float sum. An openmp sum reduction would introduce additional independent partial sums.

Double accumulation of results from float operations might be more efficient in x87 code than in scalar SSE code, but x87 rules out vectorization. Various tactics might be used to get x87 code, possibly including defining the sum as long double, casting operands to long double, and setting corresponding compile options. I don't recommend this, if Jim's idea works.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

This holds true when the sum-total produced is a float.

When the sum-total is to be a double (from floats)then the trick is how to reduce the number of float to double conversions (while keeping SEE on floats) and eventually producing a double as sum-total (and while keeping precision). The technique I outlined earlier trades off performance against precision in a tunable manner. (tunable on the frequency the grand totaling is performed).

The vector sum of floats to doubles might be a useful extention to AVX

IOW small vector of floatsfrom memory accumulated into 2 xmm registers as doubles

How useful this would be, I am in no position to tell. Although RAM prices are falling and you could change your stored data from float to double with little cost ($) you still have the memory bandwidth issue in that the floats will fetch twice as fast as the doubles. Speed matters.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In place of his double-precision grand accumulator, use a poor-man's integer counter for float overflow.

Use a float threshhold value, say 1e7.

After a number of iterations of your algorithm, say, every 100 iterations, check if your float accumulator contains a value greater than THRESHHOLD. If so, increment the poor-man's counter, and subtract THRESHHOLD from the float accumulator.

At the end, find the grand accumulator value as (poor man's counter contents) X THRESHHOLD + float accumulator contents.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

There might be something the compiler is doing bad when you cast the float to double originally. is it possible for you to send me a testcase or code snippet? (use private if prefer.)

I'm just hoping that we could improve the compiler so itmight benifit all.

thanks,

Jennifer

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Why do the check?

It is faster (and just as accurate) to extract the float int portion of the number, add to higher precsion accumulator, and subtract from running precision accumulator. This assumes the end result is float or at least the end result has fewer than 7/8 digits following ".".

IOW adding 0.0is faster than testing for and branching around

This should be code-able all in SSE using all 4 packed floats.

You would have better precision in the lsb but the end resultsare still SP or in DP if using

double result = (double)high + (double)low;

This will give you up to 7 digits to left of "." and 6/7 digits to right of "."

I am not an expert on the SSE (or intrinsics) I did not see a convert 4 SP floats to 4 SP float integers in one instruction (it may be there) but you can do 4 SP floats to 4 int 32s but the document I have does not show a conversion the other way (because they do not know how to handle potential overflow).

An alternate way to a 4-wide int functionfor SP floating point (result 4 floats with no fraction)

(assuming all positive numbers)

add a 4-wide literal (kept in register) with each containing 2^24.

then immediatly subtracting the same literal.

The first add will flush out any fraction bits, the subtract with remove the 2^24

(use 2^53 for DP hack)

*** caution ***

the above will work provided the internal archetecture of the SSE does not change to maintain residual roundoff bits (similar to FPU instructions).

When (if) than happens, you may need to add an additional move (register to register). I do not forsee this change happening.

Jim Dempsey

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jim Dempsey

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page