mul_hi bug report

George_W_ · ‎07-03-2014

Windows 8.1 64-bit, Intel HD 4600 (both latest release and beta drivers), the following snippet from an OpenCL kernel produces an incorrect result:

    if (k_delta == 38443432 && jj==4620)
       printf((__constant char *)"cl_barrett32_87_gs:  jj=%x kdelta=%x  mulhi=%x\n",
         (uint)jj, (uint)k_delta, (uint)mul_hi((uint)jj,(uint)k_delta));

the output is:

cl_barrett32_87_gs: jj=120c kdelta=24a99a8 mulhi=0

It is my understanding that mul_hi should not produce a zero result here.

I also have a (likely) related multiplication bug:

facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent;

fails with the upper 32-bits being zero where NUM_CLASSES is a #define for 4620 and exponent is a value in the 50 million area.

George_W_ · ‎07-03-2014

OK, now it gets weird. If I add one line, that really does nothing, then the code snippet works (mul_hi returns 0x29). That line is:

jj = jj % k_delta;

Update: This code actually does something. In the original code snippet a smart optimizing compiler can determine that jj is a constant 4620. Adding the line above forces the compiler to place the jj value in a register or memory.

Raghupathi_M_Intel · ‎07-10-2014

Hi George,

Is it possible to add the full reproducer?

Thanks,
Raghu

George_W_ · ‎07-15-2014

Hi Raghu,

I failed at creating a tiny reproducible case so I removed a ton of extraneous code from my program and zipped it up for you. The zip includes all the source, MSVC make files, and a prebuilt executable.

The buggy code is in src/gpusieve.cl function CalcModularInverses above the "if (prime == 13)" printfs. It reproduces both the mul_hi and ulong multiplication bug. It also includes the correct result when the constant 4620 is assigned to a variable.

Let me know if you need more.

Regards,

George

George_W_ · ‎07-23-2014

Hi,

Were you able to reproduce the bug with the code in my last post? Anything else I can provide to help?

Regards,

George