- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Recently, I want to port a complex program based on cpu to MIC. Because of the complex struct ,so I use the _Cilk_shared to manager the pointer to complex struct. I also make it successfully running on mic. Only use one core ,so it's performance not good and I try to using openmp [pragma omp parallel for] to parallel the for iteration. But the performance not became better and I print the info show that the program only use one core to run. Even after I annotate some functions. The test code showed below:
_Cilk_shared void offloadfunction(worker_t w,bwt_t *bwtmic,uint8_t *pacmic ,int n) { int i=0; // some complex data transport w.bwt=bwtmic; w.pac=pacmic; struct timeval tv1,tv2; struct timezone tz; gettimeofday (&tv1, &tz); #pragma omp parallel for num_threads(200) for (i = 0; i < 1000000000; i++) { int j = 10; j = j * 10; if(i%100000==0) printf("\n %d %d %d \n",omp_get_num_procs(),omp_get_num_threads(),omp_get_thread_num()); } gettimeofday (&tv2, &tz); float t=(tv2.tv_sec-tv1.tv_sec)+ (tv2.tv_usec-tv1.tv_usec)/1000000; }
The output is :
236 1 0
236 1 0
236 1 0
236 1 0
236 1 0
236 1 0
This offload program just used one core on mic . What can I do for it ?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wouldn't expend a lot of effort on Cilk.
https://software.intel.com/en-us/forums/intel-cilk-plus/topic/745556
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rob J. wrote:
I wouldn't expend a lot of effort on Cilk.
https://software.intel.com/en-us/forums/intel-cilk-plus/topic/745556
Thank you very much , I find my mistake today.
Because of the code is C file, I only add -qopenmp on Intel C++ Compile in Eclipse. So after I add -qopenmp on Intel C Compile , it is ok .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In general, in case you do only a moderate amount of work in the OpenMP section, you will see a KMP_BLOCKTIME delay before OpenMP releases threads back to Cilk. It can be moderated under hyperthreads (and on MIC) by limiting both OpenMP and Cilk to 1 thread or worker per core. Intel never recommended mixing cilkplus with OpenMP; as others mentioned, Cilk support has gone away. The inefficiency of Cilk on MIC might have contributed to the decision to drop support.
printf() should be serialized anyway, but the critical is reasonable, recognizing that region will take a significant time interval.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page