when to spawn

newport_j · ‎03-14-2012

I am unsure why or when one cilk_spawns a function. It should be clear that if a section of code in a computer program is taking a lot the program's time, then you must look at why.

In my case that usually is becuase there are some very long running serial for loops. So you give them the cilk_for treatment.But, when do you cilk_spawn a function? If you treated the serial for loops that are long running (with repsect to time), why do anything else? I am unsure as tto the advantage to spawning a function. It seems the loops (provided they are independent) is all thta you need to do.

My second questions is how does one run a program once on each core of a multicore processor. In my case a four core Xeon processor.

Any help apprecciated. Thanks in advance.

Newport_j

Barry_T_Intel · ‎03-14-2012

As you noted, cilk_for is great for long running serial loops.

cilk_spawn is great for use in recursive algorithms such as the classic fibonacci number calculation we use as an example. Another example is the implementations of quick sort we ship as samples.

Use whichever is appropriate to your application. We helped one customer who parallelized his entire application with a few well-placed cilk_for's. The key is to know your application and how it works.

As to how to run a program once on each core, I'm not sure what you're asking for. Are you asking for the ability to pin a Cilk worker to each core? That's not currently possible, and we're not sure that it's desireable.

If you want to run the same executable once on each core, I'd simply start 4 instances and let the OS figure it out.

- Barry

newport_j · ‎03-14-2012

I just want to the same executableonce on each core. How does one dothat?

Newport_j

Georg_Z_Intel · ‎03-14-2012

Hello,

starting the same application multiple times and assigning it to different CPU cores is outside the scope of Intel Cilk Plus programming. Here, it's easiest to configure OS thread/process scheduling directly, e.g.:

For Linux:

$ taskset 0x1 ./my_app & taskset 0x2 ./my_app & taskset 0x4 ./my_app ...

For Windows:

c:\> start /AFFINITY 0x1 my_app.exe
c:\> start /AFFINITY 0x2 my_app.exe
c:\> start /AFFINITY 0x4 my_app.exe
...

Both examples start an application (my_app[.exe]) three times and assign each instance to CPU core #1, #2 & #3 respectively.

Keep in mind that Hyper-Threading doubles the amount of visible CPU cores and both hyper-threads on one core compete against the resources.

Best regards,

Georg Zitzlsberger

newport_j · ‎03-14-2012

So if the program that you are working on has no recursive calls(function that calls iteslf) then there is no reason to have any cilk_spawns in the code - anywhere?

Is this correct?

Thanks in advance.

Newport_j

Barry_T_Intel · ‎03-14-2012

Recursive algorithms work naturally with cilk_spawn. You might have some other reason to use cilk_spawn in a non-recursive situation.

But if you've used a cilk_for in your main loop, then there's no requirement to include cilk_spawns elsewhere in your code. The cilk_for should provide all of the parallelism you need.

- Barry

jimdempseyatthecove · ‎03-15-2012

Assume your program has (uses) a (long) series of relatively short for loops.
Assume further that you can partition the long series of these for loops such that each partition can run independently from other partitions.

In the above scenario you could use cilk_spawn to run each separate partition (and cilk_sync to wait for all partitions to complete).

Selection of the partition size and work loads may be critical to good efficiency. e.g. with system with 2 cores (threads) you might want two partitions of approximately equal work.

Jim Dempsey

newport_j · ‎03-16-2012

Jim Dempsey:

Can you give me an small example of what you are discussing?

Any help appreciated. Thanks in advance.

Newport_j

jimdempseyatthecove · ‎03-19-2012

Assume you have a filtering process where you have a large number of unknowns comming in, and you wish to classify (fliter) the data in multiple ways. The filters containing relatively small for loops.

One technique is for each thread to apply all filters (e.g. cilk_for on the input object list). Depending on the filters, this may not necessarily be cache friendly.

An alternate approach is to spawn multiple tasks, each task (each run by one thread), using a serial for loop over all input objects, however restricting its thread's activity to a subset of filters. This can keep the filter data enclosed within the core's L1 cache.

Jim Dempsey

tzannes · ‎03-19-2012

I don't know much about pining threads (or workers) to cores but there's an interesting tutorial presented in PPoPP11 in favor of having such an option.
http://blogs.fau.de/hager/tutorials/ppopp11/

Barry_T_Intel · ‎03-19-2012

RE: pinning threads in Reply 9

I've moved this to it's own topic: Pinning Cilk workers to Cores . Please dicuss pinning there.

- Barry