Software Archive
Read-only legacy content
17061 Discussions

using jagged array on host and coprocessor

TK
Beginner
953 Views

i have the following code:

// test2.c

#pragma offload_attribute (target(mic))

int phi()

{

 return data[0][0]+data[0][1];

}

int main()

{

 int i,j;

 int **data;

 data=(int**)malloc(3*sizeof(int*));

 data[0]=(int*)malloc(2*sizeof(int));

 data[1]=(int*)malloc(4*sizeof(int));

 data[2]=(int*)malloc(8*sizeof(int));

 for(i=0; i<2; i++) data[0][1]=10;

...

// done filling data

 #pragma offload target(mic)

 j=phi();

 printf("j=%d\n",j);

}

How do I get the "data" coexist on MIC and HOST after the point "done filling data"? So that I do not have to use the "#pragma offload target(mic) in(data)". Thanks.

0 Kudos
10 Replies
Frances_R_Intel
Employee
953 Views

It sound like this is a job for _Cilk_offload.

[cpp]

#include <stdio.h>

int ** _Cilk_shared data;

int _Cilk_shared phi()
{
return data[0][0]+data[1][1]+data[2][2];
}

int main()
{
int i,j;

data=(int**) _Offload_shared_malloc(3*sizeof(int*));

data[0]=(int*)_Offload_shared_malloc(2*sizeof(int));
data[1]=(int*)_Offload_shared_malloc(4*sizeof(int));
data[2]=(int*)_Offload_shared_malloc(8*sizeof(int));

for(i=0; i<2; i++) data[0]=i;
for(i=0; i<4; i++) data[1]=i*10;
for(i=0; i<8; i++) data[2]=i*100;


j= _Cilk_offload phi();

printf("j=%d\n",j);

}

[/cpp]

0 Kudos
Frances_R_Intel
Employee
953 Views

I probably should have explained a little.

Using _Cilk_offload instead of the #pragma offload, I can use the virtual shared memory model. 

int ** _Cilk_shared data; creates an int** variable named data at the same virtual memory address on both the host and the coprocessor. 

_Offload_shared_malloc malloc's space on both the host and coprocessor at the same virtual memory address.

Whenever you enter or leave an offload region, the data in that virtual shared memory is sync'ed up between the host and coprocessor,

You can't use this programming model in Fortran and you can't use it when you need more control over what data gets transfered when. But if you want to be able to hand pointers back and forth between host and coprocessor, then this is what you use.

Is this the best solution to your problem? Maybe, maybe not. The alternatives are basically to either malloc all of the space at one time in a single array or pass each of the array lines individually. Either of these requires more changes to your code but could give you more control over performance.

0 Kudos
TK
Beginner
953 Views

Thanks. If I have a "phi" called inside a loop like

for(i=0; i<100; i++)

{

 array=phi(i);

}

where "phi" is declared as "int _Cilk_shared phi(int i)". Can I use openmp parallel for outside that loop? Or cilk already takes care of that? Thanks.

0 Kudos
TK
Beginner
953 Views

What I wanted to do is smth. as follows:

[cpp]

int _Cilk_shared phi(int a)

{...}

int main()

{

...

#pragma offload target(mic)

#pragma omp parallel for

for(i=0; i<100; i++)

{

 array= _Cilk_offload phi(i);

}

...

}

[/cpp]

Thanks.

0 Kudos
Ravi_N_Intel
Employee
953 Views

you have 2 options

1.   Outline the #pragma omp parallel for into a seperate routine, delete #pragma offload and call that routine using _Cilk_offload.   You don't need to use _Cilk_offload phi(i) in this case, just call phi(i)  directly.

2.  Replace #pragma offload  and #pragma omp  with the following
_Cilk_offload _Cilk_for(i=0; i<100;  i++).   Again you don't need to use _Cilk_offload phi(i) just call phi directly.

0 Kudos
TK
Beginner
953 Views

Thanks. If let's say I did not use the Cilk, then how would I offload the above "int **data;" to coprocessor? Thanks!

0 Kudos
Frances_R_Intel
Employee
953 Views

If you didn't want to use Cilk?

You can malloc all the space in one contiguous block, pass the whole block to the offload region, then set up the int** data array inside the offload region with data[0], data[1] etc pointing to the start of each row inside that contiguous block. (If you need to use the int** data array in both the host code and the offload code, you should either give the array a different name inside the offloaded code or specify int** data array as nocopy on the offload statement, so that the addresses on the host and on the coprocessor don't clobber each other.)

Or, you can malloc each row individually and pass the individual rows to the offload code. (The same warning about not clobbering the contents of the int** data array apply here.) 

Personally I like the first solution more because I am lazy. But the second solution has the advantage that you can use offload_transfer to send each line over asynchronously as it is ready. This will make entering the offload region faster.

0 Kudos
TK
Beginner
953 Views

Thanks. If its ok, could you show example for the first one? Thanks!!!

0 Kudos
Frances_R_Intel
Employee
953 Views

At the risk of having people look at my version of your code and shake their heads sadly:

[cpp]

#include <stdio.h>
#include <stdlib.h>

#pragma offload_attribute(push,target(mic))
int** data;
int* data_array;

int phi()
{
return data[0][0]+data[1][1]+data[2][2];
}
#pragma offload_attribute(pop)

int main()
{
int i,j;

data=(int**) malloc(3*sizeof(int*));
int row_length[]={2,4,8};
int total_length = row_length[0]+row_length[1]+row_length[2];
data_array = (int*)malloc(total_length*sizeof(int));

data[0]=&(data_array[0]);
data[1]=&(data_array[row_length[0]]);
data[2]=&(data_array[row_length[0]+row_length[1]]);
printf("addresses %p,%p,%p\n", data[0],data[1],data[2]);

for(i=0; i<2; i++) data[0]=i;
for(i=0; i<4; i++) data[1]=i*10;
for(i=0; i<8; i++) data[2]=i*100;

#pragma offload target(mic) nocopy(data:length(3) alloc_if(1)) inout(data_array:length(total_length))
{
data[0]=&(data_array[0]);
data[1]=&(data_array[row_length[0]]);
data[2]=&(data_array[row_length[0]+row_length[1]]);

printf("addresses %p,%p,%p\n", data[0],data[1],data[2]);
j=phi();
}

printf("j=%d\n",j);
}

[/cpp]

 I printed out the addresses of each row to show how the values change from the host to the coprocessor

0 Kudos
TK
Beginner
953 Views

Thanks a lot!

0 Kudos
Reply