- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am porting a large C++ application to MIC, and I would like to use _Cilk_shared to transport the data between host and Xeon Phi. Of course, I would like to avoid rewriting the whole code, so what I do is this in the .cpp file that implements myClass:
[cpp]#pragma offload_attribute(push, _Cilk_shared)
#include <myClass.h>
#pragma offload_attribute(pop, _Cilk_shared)
...
void myClass::myMethod(){
std::vector<bool> localData;
....[/cpp]
myClass.h contains this:
[cpp]#include <vector>
class myClass {
....
void myMethod();
std::vector<float> m_floatData;
};[/cpp]
Now I have a dilemma:
- for the class definition, I need std::vector to be _Cilk_shared, which is just what happens when I include as shown.
- for the method local variables, _Cilk_shared is forbidden, see http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/GUID-8074C7BB-EBC4-46A5-9B6E-DC76E0DF10F9.htm
Is there any simple solution to this?
Georg
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am curious as to why you would want localData to be _Cilk_shared? Could you please share how you intend to use the localData?
-Sumedh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Georg,
I had a similar problem in a code that I was porting recently, only my code was using valarrays rather than vectors to hold the data. It was my understanding that it is impossible to make a class _Cilk_shared if some of the members are non-_Cilk_Shared objects, like std::vectors. However, even if you made this happen (e.g., in a native application), there are practical disadvantages of using vectors or valarrays in Xeon Phi code:
a) With data in vectors or valarrays, you have no control over the alignment of data in memory. This can result in moderate to severe performance penalties on Xeon Phi coprocessors.
b) When you do seemingly harmless operations with vectors, such as the creation of a temporary vector on a function stack, or using push_back, you occasinally trigger dynamic memory allocation in the vector class. This operation is inherently sequential and may have a really bad effect on the performance on the coprocessor (it was quite severe in my application).
So, the way that I see it, there are three possible ways to port this code:
1) Implement your own class MyVector analogous to std::vector, and ensure that it allocates data on a 64-byte boundary, and that it does not use _mm_malloc() when you don't need it.
2) Create a derived class myClassPort : public myClass. The constructor of myClassPort should copy all data in from the vectors of myClass into arrays of float. This gives you control over alignment and, at the same time eliminates the overhead of abstraction in the performance-critical part. After that, you can use the explicit offload model (with "#pragma offload") to launch calculations on the coprocessor. This is the method that I chose for my code, because I wanted the best control over data allocation and transport.
3) Of course, you can also compile a native application without any code changes, but I don't know if this is a good option for your application. It was not for mine.
Andrey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sumedh Naik (Intel) wrote:
I am curious as to why you would want localData to be _Cilk_shared? Could you please share how you intend to use the localData?
I dont want localData t be _Cilk_shared. But because member m_floatData is part of a _Cilk_shared class, std::vector also needs to be _Cilk_shared, and therefore I cannot avoid localData to be _Cilk_shared. Or am I missing something here?
Georg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Goerg,
I now understand the issue. In this case, instead of marking the entire class as shared, you use a shared allocator defined in offload.h to create a shared vector object. Here is an example:
#include <vector>
_Cilk_shared class myClass {
....
void myMethod();
_Cilk_shared std::vector<float, __offload::shared_allocator<int>> _Cilk_shared m_floatData;
};
Here is another example, I found which instantiates and manipulates shared versions of C++ STL vectors.
#include <vector>
#include <offload.h>
#include <stdio.h>
using namespace std;
typedef vector<int, __offload::shared_allocator<int> >
shared_vec_int;
_Cilk_shared shared_vec_int * _Cilk_shared v;
_Cilk_shared int test_result() {
int result = 1;
for (int i = 0; i < 5; i++) {
if ((*v) != i) {
result = 0;
}
}return result;
}
int main() {
int result;
v = new (_Offload_shared_malloc(sizeof(vector<int>))) _Cilk_shared vector<int,__offload::shared_allocator<int>>(5);
for (int i = 0; i < 5; i++) {
(*v) = i;
}
result = _Cilk_offload test_result();
if (result != 1)
printf("Failed\n");
else
printf("Passed\n");
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Goerg,
I now understand the issue. In this case, instead of marking the entire class as shared, you can use a shared allocator (defined in offload.h) to create a shared vector object. Here is an example:
#include <vector>
_Cilk_shared class myClass {
....
void myMethod();
_Cilk_shared std::vector<float, __offload::shared_allocator<int>> _Cilk_shared m_floatData;
};
This is another example I found that instantiates and manipulates shared versions of C++ STL vectors.
#include <vector>
#include <offload.h>
#include <stdio.h>using namespace std;
typedef vector<int, __offload::shared_allocator<int> >
shared_vec_int;
_Cilk_shared shared_vec_int * _Cilk_shared v;
_Cilk_shared int test_result() {
int result = 1;
for (int i = 0; i < 5; i++) {
if ((*v) != i) {
result = 0;
}
}return result;
}int main() {
int result;
v = new (_Offload_shared_malloc(sizeof(vector<int>))) _Cilk_shared vector<int,__offload::shared_allocator<int>>(5);
for (int i = 0; i < 5; i++) {
(*v) = i;
}result = _Cilk_offload test_result();
if (result != 1)
printf("Failed\n");
else
printf("Passed\n");return 0;
}
I hope this helps.
-Sumedh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sumedh Naik (Intel) wrote:
...I now understand the issue. In this case, instead of marking the entire class as shared, you can use a shared allocator (defined in offload.h) to create a shared vector object. Here is an example:
....
Okay, I think I understand. Let me do a few tests...
Georg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
where is _Cilk_shared? what header file do I need to include to use it. I can see other cilk contructs when I begin to type in visual studio (intellisense) but not _cilk_shared.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is no header needed for the keywords. You might include <offload.h> to use other aspects of the shared offload model; however, I believe we may be lacking defines in the <cilk/cilk.h> to enable the intellisense. I'm checking w/others about this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Our IDE integration Developer clarified the keyword highlighting and IntelliSense support.
Currently, _Cilk_for, _Cilk_spawn, _Cilk_sync are highlighted as compiler keywords in the C++ editor and that is the extent of the support that we can provide for Intel C++-specific keywords in the Visual Studio editor. There is no auto-completion or any other IntelliSense support for _Cilk_for, _Cilk_spawn, _Cilk_sync because the Visual C++ IntelliSense is not extensible, unfortunately. The contents of <cilk/cilk.h> are not relevant to IntelliSense either; however, this header triggers auto-completion for cilk_spawn, cilk_sync, cilk_for when it is included.
Further, we do not currently highlight the keywords from the offload Virtual shared model (_Cilk_shared, _Cilk_offload, _Cilk_offload_to) so I submitted a feature enhancement (see internal tracking id noted below) to have those highlighted similar to the other _Cilk keywords.
(Internal tracking id: DPD200255317)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi . I tried Sumedh Naik 's method of allocating a std::vector
and I am getting a link error from compiler ,
Here is compile command , it is similar for other .cpp file:
icpc -openmp -std=c++11 -xhost -Wno-unknown-pragmas -no-offload -opt-report-phase=offload -vec-report3 -O3 -g -o haar.o -c haar.cpp
Here is my linker command :
icpc -o vj main.o haar.o image.o stdio-wrapper.o rectangles.o -lm -mkl
Here is the error I get:
main.o: In function `main': /work/02645/pan19/project/xeon_offload/main.cpp:49: undefined reference to `__intel_new_feature_proc_init' haar.o: In function `detectObjects': /work/02645/pan19/project/xeon_offload/haar.cpp:135: undefined reference to `_Offload_shared_malloc' haar.o: In function `__offload::shared_allocator<MyRect>::allocate(unsigned long, void const*)': /opt/apps/intel/13/composer_xe_2013.2.146/compiler/include/offload.h:417: undefined reference to `_Offload_shared_malloc' haar.o: In function `__offload::shared_allocator<MyRect>::deallocate(MyRect*, unsigned long)': /opt/apps/intel/13/composer_xe_2013.2.146/compiler/include/offload.h:426: undefined reference to `_Offload_shared_free' /opt/apps/intel/13/composer_xe_2013.2.146/compiler/include/offload.h:426: undefined reference to `_Offload_shared_free' rectangles.o: In function `MyRect* std::__copy_move<true, true, std::random_access_iterator_tag>::__copy_m<MyRect>(MyRect const*, MyRect const*, MyRect*)': /usr/include/c++/4.4.7/bits/stl_algobase.h:378: undefined reference to `__intel_ssse3_rep_memmove' rectangles.o: In function `MyRect* std::__uninitialized_move_a<MyRect*, MyRect*, std::allocator<MyRect> >(MyRect*, MyRect*, MyRect*, std::allocator<MyRect>&)': /usr/include/c++/4.4.7/bits/stl_algobase.h:378: undefined reference to `__intel_ssse3_rep_memmove' rectangles.o: In function `int* std::__copy_move<true, true, std::random_access_iterator_tag>::__copy_m<int>(int const*, int const*, int*)': /usr/include/c++/4.4.7/bits/stl_algobase.h:378: undefined reference to `__intel_ssse3_rep_memmove' /usr/include/c++/4.4.7/bits/stl_algobase.h:378: undefined reference to `__intel_ssse3_rep_memmove' /usr/include/c++/4.4.7/bits/stl_algobase.h:378: undefined reference to `__intel_ssse3_rep_memmove' /usr/bin/ld: link errors found, deleting executable `vj' make: *** [vj] Error 1
Im wondering if anyone can help me? I am using TACC stampede supercomputer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
following last comment
my icc version : Intel(R) C Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.0.146 Build 20130121
I tried 14.0.1 as well. same error.
I saw in some post people are using #define"offload.h" instead of #define<offload.h> I am wondering where they get that offload.h from ?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page