Ok I spoke too fast, my grid still doesn't match perfectly.
I've spent hours on this, I tried pretty much every imaginable combination of these:
-Shift+X*Zoom, all in double float (& even extended precision)
-Shift+X*Zoom, result in single float
-Shift as single float, X*Zoom as single float, added separately
-all of these as single floats
-truncation & rounding, using FPU
-truncation & rounding, using IppsConvert_32f32s_Sfs/IppsConvert_64f32s_Sfs
-any of the above for top-left coordinate, accumulator as single precision & double precision
-64bit fixed-point accumulator
But something tells me that it can't be done because the stretching function probably isn't made to be that precise. If I was doing a stretching function I'd certainly use a fixed-point accumulator in the loop (but last time I made one it was a long time ago, there was no SSE & the FPU was slow), and floats to compute my first coordinates, not caring so much about precision. So if it's what the stretching function is doing, it's normal that I can't find a solution.
So I'm now doing my own non-interpolated stretcherbecause I spent a lot more time fiddling with this than it'd have taken to do it.
(I replied the wrong post/edited my previous post by mistake, though what I wrote below about ippiResizeSqrPixel being much better is still true for other reasons)