Software Archive
Read-only legacy content
17060 Discussions

gsl library optimization error

JJK
New Contributor III
826 Views

Hi all,

as a follow up to my previous post on the gsl library coredumping on the Xeon Phi there is good news and bad news.

The good news is: with icc v15 the coredumps are gone

The bad news is: there are other vectorization errors that seem to occur only when -mmic is used.

Consider the following program (distilled from the gsl-1.16 source code):

#include <stdio.h>

struct gsl_block_char_struct
{
  size_t size;
  char *data;
};

typedef struct gsl_block_char_struct gsl_block_char;


typedef struct
{
  size_t size1;
  size_t size2;
  size_t tda;
  char * data;
  gsl_block_char * block;
  int owner;
} gsl_matrix_char;

gsl_block_char *
gsl_block_char_alloc (const size_t n)
{
  gsl_block_char * b;

  b = (gsl_block_char *) malloc (sizeof (gsl_block_char));
  b->data = (char *) calloc (1, 1 * n * sizeof (char));
  b->size = n;

  return b;
}

gsl_matrix_char *
gsl_matrix_char_alloc (const size_t n1, const size_t n2)
{
  gsl_block_char * block;
  gsl_matrix_char * m;

  m = (gsl_matrix_char *) malloc (sizeof (gsl_matrix_char));

  block = gsl_block_char_alloc (n1 * n2) ;

  m->data = block->data;
  m->size1 = n1;
  m->size2 = n2;
  m->tda = n2;
  m->block = block;
  m->owner = 1;

  return m;
}

gsl_matrix_char_get(const gsl_matrix_char * m, const size_t i, const size_t j)
{
  return m->data[i * m->tda + j] ;
}


void
gsl_matrix_char_set(gsl_matrix_char * m, const size_t i, const size_t j, const char x)
{
  m->data[i * m->tda + j] = x ;
}

void
gsl_matrix_char_minmax (const gsl_matrix_char * m,
                               char * min_out,
                               char * max_out)
{
  const size_t M = m->size1;
  const size_t N = m->size2;
  const size_t tda = m->tda;

  char max = m->data[0 * tda + 0];
  char min = m->data[0 * tda + 0];

  size_t i, j;

//  #pragma novector
  for (i = 0; i < M; i++)
    {
//      #pragma novector
      for (j = 0; j < N; j++)
        {
          char x = m->data[i * tda + j];
          if (x < min)
           {
              min = x;
            }
          if (x > max)
            {
              max = x;
            }
        }
    }

  *min_out = min;
  *max_out = max;
}


void
test_char_func (const size_t M, const size_t N)
{
  size_t i, j;
  size_t k = 0;
  char min, max;

  gsl_matrix_char * m = gsl_matrix_char_alloc (M, N);

  for (i = 0; i < M; i++)
  {
    for (j = 0; j < N; j++)
    {
      k++;
      gsl_matrix_char_set (m, i, j, (char) k);
    }
  }

  char exp_max = gsl_matrix_char_get (m, 0, 0);
  char exp_min = gsl_matrix_char_get (m, 0, 0); 
  for (i = 0; i < M; i++)
  {
    for (j = 0; j < N; j++)
    {   
      char k = gsl_matrix_char_get (m, i, j); 
      if (k > exp_max) {
        exp_max =  gsl_matrix_char_get (m, i, j); 
      }   
      if (k < exp_min) {
        exp_min =  gsl_matrix_char_get (m, i, j); 
      }   
    }   
  }

  gsl_matrix_char_minmax (m, &min, &max);

  printf("exp_max = %02X max = %02X\n", exp_max, max);
  printf("exp_min = %02X min = %02X\n", exp_min, min);

  if (max != exp_max) fprintf(stderr, "gsl_matrix_char_minmax returns incorrect maximum value\n");
  if (min != exp_min) fprintf(stderr, "gsl_matrix_char_minmax returns incorrect minimum value\n");
}


int
main (void)
{
  size_t M = 53; 
  size_t N = 107;

  test_char_func (M, N); 
}

 

This code compiles without a warning using

icc -O2 -mmic -Wall -Wuninitialized -g -o mytest mytest.c

yet when it runs it produces a warning

$ ssh mic0 $PWD/mytest
exp_max = 7F max = 27
exp_min = FFFFFF80 min = 00
gsl_matrix_char_minmax returns incorrect maximum value
gsl_matrix_char_minmax returns incorrect minimum value

Remove the '-mmic' and rerun on the host CPU and the code runs just fine.

If I uncomment the #pragma vector lines the code also runs fine on the Xeon Phi.

 

 

0 Kudos
5 Replies
Kevin_D_Intel
Employee
826 Views

Hi Jan,

Thank you for the convenient test case!   I reproduced the behavior described using both the 14.0 and 15.0 compilers. The expected results occur at -O1 or at -O2 but with -O2 only when the novector directive is active for the inner loop. The other compiler options are not at play.  

I reported this to Development (see Internal tracking id below) for some further analysis and will keep you updated on their findings.

(Internal tracking id: DPD200360578)

(Resolution Update on 11/17/2014): This defect is fixed in the Intel® Parallel Studio XE 2015 Update 1 release (2015.0.133 - Linux)

0 Kudos
Kevin_D_Intel
Employee
826 Views

Development confirmed this is a defect related to an incorrect conversion generated using an unsigned data type instead of a signed data type. To work around (as you found), the novector can be used although they thought using an unsigned char or int instead of signed char might also. I tried those and int seemed to work but the novector is probably best/easiest.

I will keep you updated on the availability of a fix in a future IPS XE 2015 (15.0 compiler) release.

0 Kudos
JJK
New Contributor III
826 Views

Hi Kevin,

I can live with that without any issues - I think I'll create a patch for the gsl lib to compile and test it for the Xeon Phi ; that patch would include the #pragma line in it plus some minor changes to the gsl libtool code so that the tests are actually run on the Xeon Phi.

AFAIAC this ticket can be closed.

0 Kudos
Kevin_D_Intel
Employee
826 Views

Ok, sounds good Jan.
 

0 Kudos
Kevin_D_Intel
Employee
826 Views

This defect is fixed in the Intel® Parallel Studio XE 2015 Update 1 release (Version 15.0.1.133 Build 20141023 - Linux) now available from our Intel® Registration Center.

0 Kudos
Reply