Hello Mruntunjayya,

Mrutunjayya_W_ · ‎04-14-2015

I am new to Xeon Phi coding. I am getting "offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)" if my array size is greater than "500000". Following is the code I am trying to run.

If I change my array size to 600000 it throws the error.

#include <iostream>
#include <omp.h>
#include <stdlib.h>
#include <sys/time.h>
#include <stdio.h>

using namespace std;

#define MAXIMUM_VALUE 1000000

int main()
{
    const int array_size = 500000;
    struct timeval start, end;
   double runtime;

    //double *array1 = (double *) malloc( array_size * sizeof( double ) );
    //double *array2 = (double *) malloc( array_size * sizeof( double ) );
    //double *array3 = (double *) malloc( array_size * sizeof( double ) );

   double array1[array_size]; //__attribute__((aligned(64)));;
   double array2[array_size]; //__attribute__((aligned(64)));;
   double array3[array_size]; //__attribute__((aligned(64)));;

    int seed = 343;
    srand( seed );
    for( int i = 0; i < array_size; i++ )
    {
        array1 = ( (double) rand() / (double) RAND_MAX ) * MAXIMUM_VALUE;
        array2 = ( (double) rand() / (double) RAND_MAX ) * MAXIMUM_VALUE;
    }

    #pragma offload target(mic) in(array1,array2) inout(array3)
   {
       //#pragma omp parallel for
       //for(int i=0; i<array_size; i++)
           //array1 = array2 + array3;
   }

    return 0;
}

Sunny_G_Intel · ‎04-14-2015

Hi Mrutunayya,

My first impression of your code is that this error has nothing to do with the offload and Intel Xeon Phi coding. You will get the same error even if you comment out the offload section. Probably the error is due to the size of stack. If you include your dynamic memory allocation code (which you have commented), your code should execute successfully irrespective of using offload or not.

Thank you.

Mrutunjayya_W_ · ‎04-14-2015

Hi Sunny,

I tried by commenting the offload pragma. No error.

Tries setting the stack limit "ulimit -s unlimited" for both host and device. Still same issue. Any solutions.

Regards,

Mrutunjayya

Frances_R_Intel · ‎04-15-2015

A couple points to keep in mind -

Increasing the stack size on the host does not increase the stack size on the coprocessor. If you want to increase the stack size on the coprocessor for your offload programs, you will probably want to set the increased stack size as the default on the coprocessor. That way, when your program offloads (as user micuser, by default), it will pick up the larger value.

There is a difference between stack size and thread stack size. You can use the environment variable KMP_STACKSIZE to change the thread stack size for OpenMP. In order to make sure that environment variable finds its way onto the coprocessor, you will need to set the MIC_ENV_PREFIX environment variable and prepend that prefix to any environment variables you want passed to the coprocessor.

There is an article, Best Known Methods for Using OpenMP* on Intel® Many Integrated Core (Intel® MIC) Architecture, that may be helpful. It has good advice and will walk you through things like setting the environment variable.

Sunny_G_Intel · ‎04-15-2015

Hello Mruntunjayya,

It is strange that you are seeing a different behavior. Below i have attached a snapshot of the results I am getting. I added couple of print statements in your code to track if the execution was successful.

// cat stack_size.c 
#include <iostream>
#include <omp.h>
#include <stdlib.h>
#include <sys/time.h>
#include <stdio.h>
 
using namespace std;
 
#define  MAXIMUM_VALUE   1000000
 
int main()
{
    const int array_size = 600000;
    struct timeval start, end;
    double runtime;
/*
    double *array1 = (double *) malloc( array_size * sizeof( double ) );
    double *array2 = (double *) malloc( array_size * sizeof( double ) );
    double *array3 = (double *) malloc( array_size * sizeof( double ) );
  */
    double array1[array_size]; //__attribute__((aligned(64)));;
    double array2[array_size]; //__attribute__((aligned(64)));;
    double array3[array_size]; //__attribute__((aligned(64)));;
    
    int seed = 343;
    srand( seed );
    for( int i = 0; i < array_size; i++ )
    {
        array1 = ( (double) rand() / (double) RAND_MAX ) * MAXIMUM_VALUE;
        array2 = ( (double) rand() / (double) RAND_MAX ) * MAXIMUM_VALUE;
    }
 
    printf("Hello From Host\n");
    fflush(stdout); 
  
     
    #pragma offload target(mic) in(array1,array2) out(array3)
    {
         printf("Hello from MIC\n");
        //#pragma omp parallel for
        //for(int i=0; i<array_size; i++)            
            //array1 = array2 + array3;
    }    
   
    return 0;
}

This is how I compile your code and run with and without stack size increased.

[user@host test]$ icpc -o stack_size stack_size.c 
[user@host test]$ ./stack_size 
Segmentation fault (core dumped)
[user@host test]$ ulimit -s unlimited
[user@host test]$ ./stack_size 
Hello From Host
Hello from MIC
[user@host test]$

Let me know if you get different results.

Thank you