- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am unsure why or when one cilk_spawns a function. It should be clear that if a section of code in a computer program is taking a lot the program's time, then you must look at why.
In my case that usually is becuase there are some very long running serial for loops. So you give them the cilk_for treatment.But, when do you cilk_spawn a function? If you treated the serial for loops that are long running (with repsect to time), why do anything else? I am unsure as tto the advantage to spawning a function. It seems the loops (provided they are independent) is all thta you need to do.
My second questions is how does one run a program once on each core of a multicore processor. In my case a four core Xeon processor.
Any help apprecciated. Thanks in advance.
Newport_j
In my case that usually is becuase there are some very long running serial for loops. So you give them the cilk_for treatment.But, when do you cilk_spawn a function? If you treated the serial for loops that are long running (with repsect to time), why do anything else? I am unsure as tto the advantage to spawning a function. It seems the loops (provided they are independent) is all thta you need to do.
My second questions is how does one run a program once on each core of a multicore processor. In my case a four core Xeon processor.
Any help apprecciated. Thanks in advance.
Newport_j
Link Copied
10 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you noted, cilk_for is great for long running serial loops.
cilk_spawn is great for use in recursive algorithms such as the classic fibonacci number calculation we use as an example. Another example is the implementations of quick sort we ship as samples.
Use whichever is appropriate to your application. We helped one customer who parallelized his entire application with a few well-placed cilk_for's. The key is to know your application and how it works.
As to how to run a program once on each core, I'm not sure what you're asking for. Are you asking for the ability to pin a Cilk worker to each core? That's not currently possible, and we're not sure that it's desireable.
If you want to run the same executable once on each core, I'd simply start 4 instances and let the OS figure it out.
- Barry
cilk_spawn is great for use in recursive algorithms such as the classic fibonacci number calculation we use as an example. Another example is the implementations of quick sort we ship as samples.
Use whichever is appropriate to your application. We helped one customer who parallelized his entire application with a few well-placed cilk_for's. The key is to know your application and how it works.
As to how to run a program once on each core, I'm not sure what you're asking for. Are you asking for the ability to pin a Cilk worker to each core? That's not currently possible, and we're not sure that it's desireable.
If you want to run the same executable once on each core, I'd simply start 4 instances and let the OS figure it out.
- Barry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just want to the same executableonce on each core. How does one dothat?
Newport_j
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
starting the same application multiple times and assigning it to different CPU cores is outside the scope of Intel Cilk Plus programming. Here, it's easiest to configure OS thread/process scheduling directly, e.g.:
For Linux:
$ taskset 0x1 ./my_app & taskset 0x2 ./my_app & taskset 0x4 ./my_app ...
For Windows:
c:\> start /AFFINITY 0x1 my_app.exe
c:\> start /AFFINITY 0x2 my_app.exe
c:\> start /AFFINITY 0x4 my_app.exe
...
Both examples start an application (my_app[.exe]) three times and assign each instance to CPU core #1, #2 & #3 respectively.
Keep in mind that Hyper-Threading doubles the amount of visible CPU cores and both hyper-threads on one core compete against the resources.
Best regards,
Georg Zitzlsberger
starting the same application multiple times and assigning it to different CPU cores is outside the scope of Intel Cilk Plus programming. Here, it's easiest to configure OS thread/process scheduling directly, e.g.:
For Linux:
$ taskset 0x1 ./my_app & taskset 0x2 ./my_app & taskset 0x4 ./my_app ...
For Windows:
c:\> start /AFFINITY 0x1 my_app.exe
c:\> start /AFFINITY 0x2 my_app.exe
c:\> start /AFFINITY 0x4 my_app.exe
...
Both examples start an application (my_app[.exe]) three times and assign each instance to CPU core #1, #2 & #3 respectively.
Keep in mind that Hyper-Threading doubles the amount of visible CPU cores and both hyper-threads on one core compete against the resources.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So if the program that you are working on has no recursive calls(function that calls iteslf) then there is no reason to have any cilk_spawns in the code - anywhere?
Is this correct?
Thanks in advance.
Newport_j
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Recursive algorithms work naturally with cilk_spawn. You might have some other reason to use cilk_spawn in a non-recursive situation.
But if you've used a cilk_for in your main loop, then there's no requirement to include cilk_spawns elsewhere in your code. The cilk_for should provide all of the parallelism you need.
- Barry
But if you've used a cilk_for in your main loop, then there's no requirement to include cilk_spawns elsewhere in your code. The cilk_for should provide all of the parallelism you need.
- Barry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assume your program has (uses) a (long) series of relatively short for loops.
Assume further that you can partition the long series of these for loops such that each partition can run independently from other partitions.
In the above scenario you could use cilk_spawn to run each separate partition (and cilk_sync to wait for all partitions to complete).
Selection of the partition size and work loads may be critical to good efficiency. e.g. with system with 2 cores (threads) you might want two partitions of approximately equal work.
Jim Dempsey
Assume further that you can partition the long series of these for loops such that each partition can run independently from other partitions.
In the above scenario you could use cilk_spawn to run each separate partition (and cilk_sync to wait for all partitions to complete).
Selection of the partition size and work loads may be critical to good efficiency. e.g. with system with 2 cores (threads) you might want two partitions of approximately equal work.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim Dempsey:
Can you give me an small example of what you are discussing?
Any help appreciated. Thanks in advance.
Newport_j
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assume you have a filtering process where you have a large number of unknowns comming in, and you wish to classify (fliter) the data in multiple ways. The filters containing relatively small for loops.
One technique is for each thread to apply all filters (e.g. cilk_for on the input object list). Depending on the filters, this may not necessarily be cache friendly.
An alternate approach is to spawn multiple tasks, each task (each run by one thread), using a serial for loop over all input objects, however restricting its thread's activity to a subset of filters. This can keep the filter data enclosed within the core's L1 cache.
Jim Dempsey
One technique is for each thread to apply all filters (e.g. cilk_for on the input object list). Depending on the filters, this may not necessarily be cache friendly.
An alternate approach is to spawn multiple tasks, each task (each run by one thread), using a serial for loop over all input objects, however restricting its thread's activity to a subset of filters. This can keep the filter data enclosed within the core's L1 cache.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know much about pining threads (or workers) to cores but there's an interesting tutorial presented in PPoPP11 in favor of having such an option.
http://blogs.fau.de/hager/tutorials/ppopp11/
http://blogs.fau.de/hager/tutorials/ppopp11/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RE: pinning threads in Reply 9
I've moved this to it's own topic: Pinning Cilk workers to Cores . Please dicuss pinning there.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page