Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
19 Views

Using openmp in _Cilk_offload

Recently, I want to port a complex program based on cpu to MIC. Because of the complex struct ,so I use the _Cilk_shared to manager the pointer to complex struct. I also make it successfully running on mic. Only use one core ,so it's performance not good and I try to using openmp [pragma omp parallel for] to parallel the for iteration. But the performance not became better and I print the info show that the program only use one core to run. Even after I annotate some functions. The test code showed below:

_Cilk_shared void offloadfunction(worker_t w,bwt_t *bwtmic,uint8_t *pacmic ,int n)
{
	int i=0;
// some complex data transport 
	w.bwt=bwtmic;
	w.pac=pacmic;
    struct timeval tv1,tv2;
    struct timezone tz;
    gettimeofday (&tv1, &tz);

#pragma omp parallel for num_threads(200)
	for (i = 0; i < 1000000000; i++) {
		int j = 10;
		j = j * 10;
		if(i%100000==0)
		printf("\n %d %d %d \n",omp_get_num_procs(),omp_get_num_threads(),omp_get_thread_num());
	}

    gettimeofday (&tv2, &tz);
    float t=(tv2.tv_sec-tv1.tv_sec)+ (tv2.tv_usec-tv1.tv_usec)/1000000; 


}

The output is :

 236 1 0 

 236 1 0 

 236 1 0 

 236 1 0 

 236 1 0 

 236 1 0 

This offload program just used one core on mic . What can I do for it ? 

0 Kudos
3 Replies
Highlighted
Beginner
19 Views

I wouldn't expend a lot of effort on Cilk.

 

https://software.intel.com/en-us/forums/intel-cilk-plus/topic/745556

 

0 Kudos
Highlighted
Beginner
19 Views

Rob J. wrote:

I wouldn't expend a lot of effort on Cilk.

 

https://software.intel.com/en-us/forums/intel-cilk-plus/topic/745556

 

 

Thank you very much , I find my mistake today. 

Because of the code is C file, I only add -qopenmp on Intel C++ Compile in Eclipse. So after I add -qopenmp on Intel C Compile , it is ok .  

0 Kudos
Highlighted
Black Belt
19 Views

In general, in case you do only a moderate amount of work in the OpenMP section, you will see a KMP_BLOCKTIME delay before OpenMP releases threads back to Cilk. It can be moderated under hyperthreads (and on MIC) by limiting both OpenMP and Cilk to 1 thread or worker per core.  Intel never recommended mixing cilkplus with OpenMP; as others mentioned, Cilk support has gone away.  The inefficiency of Cilk on MIC might have contributed to the decision to drop support.

printf() should be serialized anyway, but the critical is reasonable, recognizing that region will take a significant time interval.

0 Kudos