How is the performance with the modified code?
There might be something the compiler is doing bad when you cast the float to double originally. is it possible for you to send me a testcase or code snippet? (use private if prefer.)
I'm just hoping that we could improve the compiler so itmight benifit all.