ICC 16.0.2: MPX pass creates too-narrow bounds in SSE-heavy code

dmitrii_k_ · ‎10-24-2016

Hello,

I use icc (ICC) version 16.0.2 (20160204). I found a bug in the way its MPX transformation pass creates bounds for SSE-heavy (and heavily-optimized) code. My computer has an Intel Skylake CPU.

Here is the minimal test case that reproduces the problem (adapted from Vips program where the bug was triggered):

#define SCALE (1<<6)
float ar[SCALE + 1][SCALE + 1][4];

void __attribute__ ((noinline)) foo() {
    int x, y;
    for( x = 0; x < SCALE + 1; x++ )
        for( y = 0; y < SCALE + 1; y++ ) {
            double X, Y, Xd, Yd;
            double c1, c2, c3, c4;

            X = (double) x / SCALE;
            Y = (double) y / SCALE;
            Xd = 1.0 - X;
            Yd = 1.0 - Y;

            c1 = Xd * Yd;
            c2 = X * Yd;
            c3 = Xd * Y;
            c4 = X * Y;

            ar[0] = c1;
            ar[1] = c2;
            ar[2] = c3;
            ar[3] = c4;
        }
}

int main() {
    foo();
    return ar[0][0][0];
}

The code raises an exception when built with O2 and -no-check-pointers-narrowing (exactly this combination on my computer):

>>> icc -O2 -ggdb -check-pointers-mpx=rw -no-check-pointers-narrowing -lmpx -lmpxwrappers vipstest.c
>>> ./a.out
Saw a #BR! status 1 at 0x400c26
Saw a #BR! status 1 at 0x400c2e
...

# now with O1: works correctly
>>> icc -O1 -ggdb -check-pointers-mpx=rw -no-check-pointers-narrowing -lmpx -lmpxwrappers vipstest.c
>>> ./a.out
[ no output ]

# now without no-check-pointers-narrowing
>>> icc -O2 -ggdb -check-pointers-mpx=rw -lmpx -lmpxwrappers vipstest.c
>>> ./a.out
[ no output ]

The offending asm snippet looks like this:

  bndmk  0x13(%rdx),%bnd1  # INCORRECT BOUND: TRIGGERS BR
  bndmk  0x1080f(%rdx),%bnd0  # CORRECT BUT UNUSED BOUND
  ...
  bndcl  0x603904(%rdi),%bnd1
  bndcl  0x603908(%rdi),%bnd1
  bndcl  0x60390c(%rdi),%bnd1
  bndcu  0x603917(%rdi),%bnd1  # TRIGGERS BR
  bndcu  0x60391b(%rdi),%bnd1
  bndcu  0x60391f(%rdi),%bnd1
  ...

Note that when compiled with O1 (or without no-check-pointers-narrowing), the asm uses the correct BND0 register. Clearly, some autovectorization (SSE) optimization pass clashes with the MPX instrumentation.

Yuan_C_Intel · ‎10-28-2016

Hi, Dmitri

Have you tried your test with Intel Compiler 17.0? I cannot reproduce the error with 17.0 on Windows.

I need to find a Linux Skylake system to verify your issue. I will let you know when I have an update on this.

Thanks.

dmitrii_k_ · ‎10-28-2016

I did not try ICC 17.0. From my side, I will try to update to ICC 17.0 and report the results on my Linux Skylake machine.

UPDATE: I installed ICC 17.0, and the bug disappeared. Great! (I guess this was expected since ICC 17.0 has better autovectorization support.) So I guess this bug report can be closed.

Yuan_C_Intel · ‎11-06-2016

Hi, Dmitrii

I'm glad to hear this. That's great. Thank you for letting me know this.

I'm closing this as the issue is fixed in 17.0.

Thanks.

dmitrii_k_ · ‎11-19-2016

I believe I see the manifestations of this bug in my other programs now (even after updating to ICC 17).

I see these bugs in SPEC 2006: vips, h264ref, and milc. I will try to come up with another test case that can be reproduced in ICC 17. In the meantime, have you tried running MPX instrumentation on SPEC 2006 under Ubuntu 16.04 + Intel Skylake?

Yuan_C_Intel · ‎12-06-2016

Hi, Dmitri

Thank you for the update.

Have you created the new test case to reproduce with 17.0? We are interested in reproducing the issue and submit it for a resolution.

Thanks.