Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Manuel_D_1
Beginner
90 Views

Offloading an Array of pointer, SIGSEGV

Jump to solution

Actually I want to offload a function call in an for-loop which I run in parallel with Openmp. The problem is, that I am able to run this loop in parallel or to offload the function call but not to do both at the same time. Every function call fills two arrays which are the only out clauses of the offload.

To run the loop in parallel I have two array of pointers to (int) output arrays, so that every parallel offload can save the output in an own array in according to this page: https://software.intel.com/en-us/articles/xeon-phi-coprocessor-data-transfer-array-of-pointers-using...

To give a better overview here the important part of the code which runs fine if I delete the offload pragma and run bar on the host:

foo() {
int *outa[threadnum], *outb[threadnum];
... // calc arrsize
   // here I want to insert #pragma omp for schedule...
   for () {
    outa = (int *) malloc (arrsize * sizeof(int)); // x always between 0 and threadnum-1
    outb = (int *) malloc (arrsize * sizeof(int));
    #pragma offload target(mic) \
    in(...)
    out( outa : length(arrsize)
    out( outb : length(arrsize)
    bar (...,outa,outb,...);
    ...
    free(outa);
    free(outb);
   }
}

Is there any obvious problem which I did not realize and leads to a Segmentation error (Happening in the really first offload, with x=0) ? For better comparison, here the part of the Code if I run the for loop not parallel on the phis (working fine):

foo() {
int *outa, *outb;

   for () {
   outa = (int *) malloc(arrsize * sizeof(int);
   outb = (int *) malloc(arrsize * sizeof(int);
   #pragma offload target(mic) \
   in(...)\
   out(outa:length(arrsize))
   out(outb:length(arrsize))
   bar(...outa,outb,...);
   ...
   free(outa);
   free(outb);
}

I appreciate every comment, if necessary I can try to create a Code-snippet to reproduce the error.

 

 

0 Kudos

Accepted Solutions
Gregg_S_Intel
Employee
90 Views

You may have more luck expressing this as follows.  But even better would be to do the threading on the card.

foo() {
#pragma omp for
for () {
  int *outa = (int *) malloc (arrsize * sizeof(int));
  int *outb = (int *) malloc (arrsize * sizeof(int));
  #pragma offload target(mic) \
  in(...)
  out( outa : length(arrsize) )
  out( outb : length(arrsize) )
  bar (...,outa,outb,...);
  ...


 

View solution in original post

0 Kudos
7 Replies
Kevin_D_Intel
Employee
90 Views

I'm not sure what might be the issue. Maybe alignment. I can inquire with our Developers.

Am I understanding correctly that the offload w/array of pointers does not yet have the omp enabled (based on code comment about where you want to add the omp pragma)?  If omp is active with the offload w/array of pointers, can it be run with a single thread?

Do you have multiple phi cards?

0 Kudos
Kevin_D_Intel
Employee
90 Views

It would help to have a reproducer to investigate and knowing your compiler version (icc -V). Thank you.

0 Kudos
Gregg_S_Intel
Employee
91 Views

You may have more luck expressing this as follows.  But even better would be to do the threading on the card.

foo() {
#pragma omp for
for () {
  int *outa = (int *) malloc (arrsize * sizeof(int));
  int *outb = (int *) malloc (arrsize * sizeof(int));
  #pragma offload target(mic) \
  in(...)
  out( outa : length(arrsize) )
  out( outb : length(arrsize) )
  bar (...,outa,outb,...);
  ...


 

View solution in original post

0 Kudos
Ravi_N_Intel
Employee
90 Views

You cannot have a parallel loop enclosing a pragma offload which allocated/transfers same variable.without any synchronization using signal/wait.
If the 1st thread is in the parallel loop is still allocating the memory and transferring the data for the offload pragma  the 2nd thread might assume the data is ready and start executing the offload.
One way to avoid this is to allocate/transfer data before the parallel loop and trasfer/deallocate after the parallel loop

eg:
#pragma offload_transfer target(mic:0) nocopy(outa : length(arrsize) alloc_if(1) free_if(0))
  #pragma paralllel loop
       #pragma offload target  

#pragma offload_transfer target(mic:0) out(data : length(size) alloc_if(0) free_if(1))

 

 

0 Kudos
Gregg_S_Intel
Employee
90 Views

Ravi,wouldn't the threaded loop be doing multiple offloads, each with its own local outa/outb arrays?  (Not that I think it is a good idea...)

0 Kudos
Ravi_N_Intel
Employee
90 Views

You are right.  I missed that each thread got its own copy.

0 Kudos
Manuel_D_1
Beginner
90 Views

First of all sorry for the late reply.

@Kevin D:
Yes we have multiple cards but using only one by setting the target to mic:0 does not help, if it is that what you were thinking about.
The first code only runs fine (even in parallel) if I delete the offload pragma, but while I try to offload the code it crashes, with and without the omp parallel pragma.

icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

Anyhow, thanks (a lot) to Gregg S. example the code now runs fine, if I have some free time after finishing the optimization I will have a look at this part again to see if I find an other way to get it to work and post it here.

0 Kudos