- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
as a follow up to my previous post on the gsl library coredumping on the Xeon Phi there is good news and bad news.
The good news is: with icc v15 the coredumps are gone
The bad news is: there are other vectorization errors that seem to occur only when -mmic is used.
Consider the following program (distilled from the gsl-1.16 source code):
#include <stdio.h>
struct gsl_block_char_struct
{
size_t size;
char *data;
};
typedef struct gsl_block_char_struct gsl_block_char;
typedef struct
{
size_t size1;
size_t size2;
size_t tda;
char * data;
gsl_block_char * block;
int owner;
} gsl_matrix_char;
gsl_block_char *
gsl_block_char_alloc (const size_t n)
{
gsl_block_char * b;
b = (gsl_block_char *) malloc (sizeof (gsl_block_char));
b->data = (char *) calloc (1, 1 * n * sizeof (char));
b->size = n;
return b;
}
gsl_matrix_char *
gsl_matrix_char_alloc (const size_t n1, const size_t n2)
{
gsl_block_char * block;
gsl_matrix_char * m;
m = (gsl_matrix_char *) malloc (sizeof (gsl_matrix_char));
block = gsl_block_char_alloc (n1 * n2) ;
m->data = block->data;
m->size1 = n1;
m->size2 = n2;
m->tda = n2;
m->block = block;
m->owner = 1;
return m;
}
gsl_matrix_char_get(const gsl_matrix_char * m, const size_t i, const size_t j)
{
return m->data[i * m->tda + j] ;
}
void
gsl_matrix_char_set(gsl_matrix_char * m, const size_t i, const size_t j, const char x)
{
m->data[i * m->tda + j] = x ;
}
void
gsl_matrix_char_minmax (const gsl_matrix_char * m,
char * min_out,
char * max_out)
{
const size_t M = m->size1;
const size_t N = m->size2;
const size_t tda = m->tda;
char max = m->data[0 * tda + 0];
char min = m->data[0 * tda + 0];
size_t i, j;
// #pragma novector
for (i = 0; i < M; i++)
{
// #pragma novector
for (j = 0; j < N; j++)
{
char x = m->data[i * tda + j];
if (x < min)
{
min = x;
}
if (x > max)
{
max = x;
}
}
}
*min_out = min;
*max_out = max;
}
void
test_char_func (const size_t M, const size_t N)
{
size_t i, j;
size_t k = 0;
char min, max;
gsl_matrix_char * m = gsl_matrix_char_alloc (M, N);
for (i = 0; i < M; i++)
{
for (j = 0; j < N; j++)
{
k++;
gsl_matrix_char_set (m, i, j, (char) k);
}
}
char exp_max = gsl_matrix_char_get (m, 0, 0);
char exp_min = gsl_matrix_char_get (m, 0, 0);
for (i = 0; i < M; i++)
{
for (j = 0; j < N; j++)
{
char k = gsl_matrix_char_get (m, i, j);
if (k > exp_max) {
exp_max = gsl_matrix_char_get (m, i, j);
}
if (k < exp_min) {
exp_min = gsl_matrix_char_get (m, i, j);
}
}
}
gsl_matrix_char_minmax (m, &min, &max);
printf("exp_max = %02X max = %02X\n", exp_max, max);
printf("exp_min = %02X min = %02X\n", exp_min, min);
if (max != exp_max) fprintf(stderr, "gsl_matrix_char_minmax returns incorrect maximum value\n");
if (min != exp_min) fprintf(stderr, "gsl_matrix_char_minmax returns incorrect minimum value\n");
}
int
main (void)
{
size_t M = 53;
size_t N = 107;
test_char_func (M, N);
}
This code compiles without a warning using
icc -O2 -mmic -Wall -Wuninitialized -g -o mytest mytest.c
yet when it runs it produces a warning
$ ssh mic0 $PWD/mytest exp_max = 7F max = 27 exp_min = FFFFFF80 min = 00 gsl_matrix_char_minmax returns incorrect maximum value gsl_matrix_char_minmax returns incorrect minimum value
Remove the '-mmic' and rerun on the host CPU and the code runs just fine.
If I uncomment the #pragma vector lines the code also runs fine on the Xeon Phi.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jan,
Thank you for the convenient test case! I reproduced the behavior described using both the 14.0 and 15.0 compilers. The expected results occur at -O1 or at -O2 but with -O2 only when the novector directive is active for the inner loop. The other compiler options are not at play.
I reported this to Development (see Internal tracking id below) for some further analysis and will keep you updated on their findings.
(Internal tracking id: DPD200360578)
(Resolution Update on 11/17/2014): This defect is fixed in the Intel® Parallel Studio XE 2015 Update 1 release (2015.0.133 - Linux)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Development confirmed this is a defect related to an incorrect conversion generated using an unsigned data type instead of a signed data type. To work around (as you found), the novector can be used although they thought using an unsigned char or int instead of signed char might also. I tried those and int seemed to work but the novector is probably best/easiest.
I will keep you updated on the availability of a fix in a future IPS XE 2015 (15.0 compiler) release.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
I can live with that without any issues - I think I'll create a patch for the gsl lib to compile and test it for the Xeon Phi ; that patch would include the #pragma line in it plus some minor changes to the gsl libtool code so that the tests are actually run on the Xeon Phi.
AFAIAC this ticket can be closed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, sounds good Jan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This defect is fixed in the Intel® Parallel Studio XE 2015 Update 1 release (Version 15.0.1.133 Build 20141023 - Linux) now available from our Intel® Registration Center.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page