Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

for loop works faster than cilk_for

Zvi_Vered
Beginner
946 Views
 
0 Kudos
3 Replies
Zvi_Vered
Beginner
946 Views

Hello,

I wrote the attached code and built it using MSDEV 2008.

My PC is Core2Duo (E8400). O.S: Windows 7 Pro. 32bit

For some reason the for loop works faster (0.049946 sec)  than the cilk_for loop (0.067103)

Can I be sure that both cores are executing the for loop ?

Thanks,

Zvika

0 Kudos
ARCH_R_Intel
Employee
946 Views

The example involves going through data that is 240,000,000 bytes (10,000,000 doubles* 8 bytes/double * 3 arrays).  That's much larger that the outer-level cache.   The benchmark has a high memory-access to flop ratio (three memory accesses for each floating-point operation).  So the benchmark is really measuring how fast the memory system can feed the processors.  A single core is likely capable of using the full memory bandwidth for this benchmark. The Cilk code may be slower because the Cilk run-time takes some time to get started the first time Cilk is invoked.  (After that, the Cilk threads are parked so that they can be woken up instead of created from scratch.)  One way to see if the initial startup is part of the issue is to repeat the two benchmark loops several times and see if the Cilk times improve the second time around.

0 Kudos
Zvi_Vered
Beginner
946 Views

Dear Mr. Robison,

You are right !

On the second iteration, with 1000 elements (smaller than outer cache),  cilk_for was faster.

Best regards,

Zvika

0 Kudos
Reply