- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually I want to offload a function call in an for-loop which I run in parallel with Openmp. The problem is, that I am able to run this loop in parallel or to offload the function call but not to do both at the same time. Every function call fills two arrays which are the only out clauses of the offload.
To run the loop in parallel I have two array of pointers to (int) output arrays, so that every parallel offload can save the output in an own array in according to this page: https://software.intel.com/en-us/articles/xeon-phi-coprocessor-data-transfer-array-of-pointers-using-language-extensions-for-offload
To give a better overview here the important part of the code which runs fine if I delete the offload pragma and run bar on the host:
foo() { int *outa[threadnum], *outb[threadnum]; ... // calc arrsize // here I want to insert #pragma omp for schedule... for () { outa= (int *) malloc (arrsize * sizeof(int)); // x always between 0 and threadnum-1 outb = (int *) malloc (arrsize * sizeof(int)); #pragma offload target(mic) \ in(...) out( outa : length(arrsize) out( outb : length(arrsize) bar (...,outa ,outb ,...); ... free(outa ); free(outb ); } }
Is there any obvious problem which I did not realize and leads to a Segmentation error (Happening in the really first offload, with x=0) ? For better comparison, here the part of the Code if I run the for loop not parallel on the phis (working fine):
foo() { int *outa, *outb; for () { outa = (int *) malloc(arrsize * sizeof(int); outb = (int *) malloc(arrsize * sizeof(int); #pragma offload target(mic) \ in(...)\ out(outa:length(arrsize)) out(outb:length(arrsize)) bar(...outa,outb,...); ... free(outa); free(outb); }
I appreciate every comment, if necessary I can try to create a Code-snippet to reproduce the error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may have more luck expressing this as follows. But even better would be to do the threading on the card.
foo() { #pragma omp for for () { int *outa = (int *) malloc (arrsize * sizeof(int)); int *outb = (int *) malloc (arrsize * sizeof(int)); #pragma offload target(mic) \ in(...) out( outa : length(arrsize) ) out( outb : length(arrsize) ) bar (...,outa,outb,...); ...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm not sure what might be the issue. Maybe alignment. I can inquire with our Developers.
Am I understanding correctly that the offload w/array of pointers does not yet have the omp enabled (based on code comment about where you want to add the omp pragma)? If omp is active with the offload w/array of pointers, can it be run with a single thread?
Do you have multiple phi cards?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would help to have a reproducer to investigate and knowing your compiler version (icc -V). Thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may have more luck expressing this as follows. But even better would be to do the threading on the card.
foo() { #pragma omp for for () { int *outa = (int *) malloc (arrsize * sizeof(int)); int *outb = (int *) malloc (arrsize * sizeof(int)); #pragma offload target(mic) \ in(...) out( outa : length(arrsize) ) out( outb : length(arrsize) ) bar (...,outa,outb,...); ...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You cannot have a parallel loop enclosing a pragma offload which allocated/transfers same variable.without any synchronization using signal/wait.
If the 1st thread is in the parallel loop is still allocating the memory and transferring the data for the offload pragma the 2nd thread might assume the data is ready and start executing the offload.
One way to avoid this is to allocate/transfer data before the parallel loop and trasfer/deallocate after the parallel loop
eg:
#pragma offload_transfer target(mic:0) nocopy(outa : length(arrsize) alloc_if(1) free_if(0))
#pragma paralllel loop
#pragma offload target
#pragma offload_transfer target(mic:0) out(data : length(size) alloc_if(0) free_if(1))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ravi,wouldn't the threaded loop be doing multiple offloads, each with its own local outa/outb arrays? (Not that I think it is a good idea...)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are right. I missed that each thread got its own copy.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all sorry for the late reply.
@Kevin D:
Yes we have multiple cards but using only one by setting the target to mic:0 does not help, if it is that what you were thinking about.
The first code only runs fine (even in parallel) if I delete the offload pragma, but while I try to offload the code it crashes, with and without the omp parallel pragma.
icc -V
Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
Anyhow, thanks (a lot) to Gregg S. example the code now runs fine, if I have some free time after finishing the optimization I will have a look at this part again to see if I find an other way to get it to work and post it here.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page