<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I assume that &amp;quot;process the in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060738#M53979</link>
    <description>&lt;P&gt;I assume that "process the tree starting at the leaf nodes" means that you plan to implement a queue or something of things to work on, and then implement your own scheduler. &amp;nbsp;I recommend against that path for the reasons you have mentioned (you would have to introduce locks), but also because your scheduler is unlikely to be as good as the Cilk scheduler.&lt;/P&gt;

&lt;P&gt;The problem sounds like you may not have much parallelism: &amp;nbsp;(you have indicated that your tree is very deep). &amp;nbsp;Have you run cilkview to find out what is the parallelism?&lt;/P&gt;

&lt;P&gt;-Bradley&lt;/P&gt;</description>
    <pubDate>Fri, 30 Jan 2015 12:42:09 GMT</pubDate>
    <dc:creator>Bradley_K_</dc:creator>
    <dc:date>2015-01-30T12:42:09Z</dc:date>
    <item>
      <title>_cilkrts_hyperobject_dealloc is expensive</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060734#M53975</link>
      <description>&lt;P&gt;Intel Vtune shows that &amp;nbsp;_cilkrts_hyperobject_dealloc is a very expensive operation. Is it because I spawning too much?&lt;/P&gt;

&lt;P&gt;I have fairly deep recursion so the stack grows quite large. Maybe that is the issue.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2015 13:08:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060734#M53975</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-01-28T13:08:46Z</dc:date>
    </item>
    <item>
      <title>Spawning doesn't cause</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060735#M53976</link>
      <description>&lt;P&gt;Spawning doesn't cause excessive calls to _cilkrts_hyperobject_dealloc; stealing does. Excessive stealing occurs when you spawn units of work that are too small, especially if the spawns occur one-at-a-time rather than in a parallel divide-and-conquer pattern. I would need to understand the pattern of your spawns to see if that could be the issue. The other thing that causes _cilkrts_hyperobject_dealloc to be called frequently is the use of a large number of reducers, such as, perhaps, an array of reducers. It is usually better to use other mechanisms, such as a (custom) reducer of array rather than an array of reducer. Finally, there may be nothing wrong with your code. You might simply have run into one of those areas of the Cilk runtime that have not been well optimized. __cilkrts_hyperobject_dealloc is simply a call to free(). Your pattern might be one where we could stand to improve performance of the runtime library by using a custom allocation scheme for hyperobjects.&lt;/P&gt;

&lt;P&gt;If there is any chance that you could share a santitized version of your code (keep it small and no proprietary information, please), I might be able to help you further.&lt;/P&gt;

&lt;P&gt;Regards,&lt;BR /&gt;
	Pablo&lt;/P&gt;</description>
      <pubDate>Thu, 29 Jan 2015 19:17:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060735#M53976</guid>
      <dc:creator>Pablo_H_Intel</dc:creator>
      <dc:date>2015-01-29T19:17:43Z</dc:date>
    </item>
    <item>
      <title>FYI I am new to Cilk and</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060736#M53977</link>
      <description>&lt;P&gt;FYI I am new to Cilk and learning how it works. Initially I just tried to create many small tasks so there was a lot to parallelism. It seems I overdid it.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;It is very simple to describe what I am doing. I &amp;nbsp;processing&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;a tree starting from the root recursively as follows.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;&amp;nbsp; &amp;nbsp; processnode(root of tree)&lt;/P&gt;

&lt;P&gt;where&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;func processnode(x)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;begin&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp; &amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;for y in child of x&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;cilk_spawn processnode(y)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp; &amp;nbsp;cilk_sync&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp; &amp;nbsp;do some work x usually small towards the bottom&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;end&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;In fact what I did was to stop recursion at some point so instead of processing nodes I process whole subtrees. That reduced the time of allocation but it is was still significant. Ideas along this line is obvious.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;You already helped me quite a bit. If needed I think I can create a simple example that simulates what I do you could have that does not relveal anything significant. But now let me first work more with cilk.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Btw I have no reducers. I still have not figured out whether they are useful for me and how I can use them from pure C code. All your examples seems C++ based.&lt;/P&gt;

&lt;P&gt;Btw my company MOSEK (mosek.com) does mathematical optimization and we are looking into whether cilk can be used to improve the parallelization of our code i.e. better scalability. We have used Openmp earlier and now native threads but cilk seems nicer.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jan 2015 08:01:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060736#M53977</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-01-30T08:01:08Z</dc:date>
    </item>
    <item>
      <title>Btw now I will try the</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060737#M53978</link>
      <description>&lt;P&gt;Btw now I will try the reverse e.g. &amp;nbsp;process the tree starting at the leaf nodes. This has the drawbacks that I have to keep track of whether all the children of a node has been processed because then I can start processing the node. &lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Also I have to have stack of nodes ready to process. T&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;his means I have to introduce locks when working on that info. Now locks does not seem to be a part of cilk which is a pain because Linux and Windows do not provide a unified lock.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Well, we have made a wrapper for locks but it is pain anyway.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jan 2015 11:31:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060737#M53978</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-01-30T11:31:41Z</dc:date>
    </item>
    <item>
      <title>I assume that "process the</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060738#M53979</link>
      <description>&lt;P&gt;I assume that "process the tree starting at the leaf nodes" means that you plan to implement a queue or something of things to work on, and then implement your own scheduler. &amp;nbsp;I recommend against that path for the reasons you have mentioned (you would have to introduce locks), but also because your scheduler is unlikely to be as good as the Cilk scheduler.&lt;/P&gt;

&lt;P&gt;The problem sounds like you may not have much parallelism: &amp;nbsp;(you have indicated that your tree is very deep). &amp;nbsp;Have you run cilkview to find out what is the parallelism?&lt;/P&gt;

&lt;P&gt;-Bradley&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jan 2015 12:42:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060738#M53979</guid>
      <dc:creator>Bradley_K_</dc:creator>
      <dc:date>2015-01-30T12:42:09Z</dc:date>
    </item>
    <item>
      <title>Jim Sukha wrote an excellent</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060739#M53980</link>
      <description>&lt;P&gt;Jim Sukha wrote an excellent article on how to determine why your program isn't speeding up: &lt;A href="https://software.intel.com/en-us/articles/why-is-my-cilk-plus-program-not-showing-speedup-part-1"&gt;Why is Cilk Plus not speeding up my program?&lt;/A&gt; I recommend it highly.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; - Barry&lt;/P&gt;</description>
      <pubDate>Fri, 30 Jan 2015 15:16:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060739#M53980</guid>
      <dc:creator>Barry_T_Intel</dc:creator>
      <dc:date>2015-01-30T15:16:52Z</dc:date>
    </item>
    <item>
      <title>I reorganized the</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060740#M53981</link>
      <description>&lt;P&gt;I reorganized the computations but still get things like:&lt;/P&gt;

&lt;P&gt;Top hotspots with 4 threads&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;func@0x78ea15fa&amp;nbsp;&amp;nbsp; &amp;nbsp;0.523s&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;_cilkrts_hyperobject_dealloc&amp;nbsp;&amp;nbsp; &amp;nbsp;0.337s&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Top Hotspots with 2 threads&lt;BR /&gt;
	Function&amp;nbsp;&amp;nbsp; &amp;nbsp;CPU Time&lt;BR /&gt;
	run &amp;nbsp; &amp;nbsp;0.265s&lt;BR /&gt;
	do_something &amp;nbsp; 0.223s&lt;BR /&gt;
	func@0x78ea15fa&amp;nbsp;&amp;nbsp; &amp;nbsp;0.172s&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;in Vtune. I have no idea what the anonymous function with @ is.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Since I do not use reducers at all then it is excessive stealing that must be the issue. Note dealloc does not feature for 2 workers but does for 4 workers.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;However, I have cut down the number of spawns a lot e.g. something like 2*NWORKERS spawns. &lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Then I hoped I would not see the dealloc function but I do for NWORKERS&amp;gt;1 and particular for 4. It is a quad core CPU.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;My conclusion is load balancing is the issue. Sounds reasonable?&lt;/P&gt;

&lt;P&gt;My original hope with cilk was that load balancing issues should be less severe but maybe that is not the case.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 02 Feb 2015 13:47:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060740#M53981</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-02-02T13:47:41Z</dc:date>
    </item>
    <item>
      <title>To Pablo:</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060741#M53982</link>
      <description>&lt;P&gt;To Pablo:&lt;/P&gt;

&lt;P&gt;I am close to try something different than cilk because even if I reduce the number spawns to to very little, t Then Vtune tells me that some anonymous function and the dealloc function mentioned in my original post uses a lot of time. Maybe that is nothing to worry about and an artificate of how Cilk or Vtune works but it makes me fell that there is an inefficiency somewhere that I cannot get at. &lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;In other words cilk has not provided much benefit over our existing native threads based code.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I can provide an *.exe linked to cilk and some example data if you want to profile it.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Thanks for the help.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 19:51:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060741#M53982</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-02-03T19:51:10Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060742#M53983</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; A few questions/comments based on my quick reading of the post. &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Something seems a little bit fishy with the report from VTune + Cilk Plus; if your program does not use reducers, then I'm not sure why it would spend a lot of time in __cilkrts_hyperobject_dealloc. &amp;nbsp; I am wondering maybe if the symbols in the Cilk Plus runtime aren't being read properly, and the time is showing up in the wrong functions?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Some other wild guesses on my part:&amp;nbsp;&lt;BR /&gt;
	&lt;BR /&gt;
	1. &amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Do you know how deep the nesting of spawned functions is? &amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;In the current implementation, there is a limit of ~ 1000 for nested of cilk_spawn nodes. &amp;nbsp;It might be possible that if you are nesting deeper than that, or perhaps if you are overflowing the 1MB stacks that Cilk Plus uses, then some data structure could be corrupted / something unexpected could be happening? &amp;nbsp; But I would have expected the program to crash if something is being overflowed, and it sounds like you changed that already...&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;2. &amp;nbsp;Do you know which model CPU you are running on --- is it 4 full cores, or 2 cores with SMT/hyperthreading turned on? &amp;nbsp; R&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;unning Cilk Plus worker threads on the SMT threads doesn't always work well. &amp;nbsp;In that case, I've seen behaviors where the worker threads which are idle will interfere with the other workers.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Having a binary to test and profile seems like it would helpful for tracking down potential runtime issues. &amp;nbsp;&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Are you compiling / running on Windows, (or Linux)?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Cheers,&lt;BR /&gt;
	&lt;BR /&gt;
	Jim&lt;/P&gt;</description>
      <pubDate>Tue, 03 Feb 2015 21:06:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060742#M53983</guid>
      <dc:creator>Jim_S_Intel</dc:creator>
      <dc:date>2015-02-03T21:06:34Z</dc:date>
    </item>
    <item>
      <title>I agree that the Vtune</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060743#M53984</link>
      <description>&lt;P&gt;I agree that the Vtune results are fishy and not trust worthy. You conclusion about Vtune and symbols &amp;nbsp;is likely to be case I would say but it makes profiling hard. Do you have any suggestion for "fixing" this i.e. getting reliable Vtune results? &amp;nbsp;Are there any information about using Vtune on cilk applications?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;I run on Windows and has disabled the hyperthreading and because my code is very floating point intensive. Actually it does a lot of calls to seq. MKL. THe CPU&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&amp;nbsp;is a E3-1270 v2 which to my understanding has 4 cores.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I could potentially do a very deep nesting for certain large problems but my current test examples does not have it. But is really NICE to know that could be issue that is better avoided.&lt;/P&gt;

&lt;P&gt;I have not tried on Linux that but the code is build to work on Linux.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2015 09:07:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060743#M53984</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-02-04T09:07:01Z</dc:date>
    </item>
    <item>
      <title>I have wished for a way to</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060744#M53985</link>
      <description>I have wished for a way to restrict workers to 1 per core, particularly on the recent platforms which lack a bios set-up option.</description>
      <pubDate>Wed, 04 Feb 2015 12:37:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060744#M53985</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2015-02-04T12:37:21Z</dc:date>
    </item>
    <item>
      <title>The PDB that is shipped with</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060745#M53986</link>
      <description>&lt;P&gt;The PDB that is shipped with cilkrts20.dll is stripped - it only has exported symbols. This usually allows you some clue about why you're spending time in the Cilk runtime, but it can get confused. If you're seeing a large offset from the start of the routine, the symbol resolution is almost always confused.&lt;/P&gt;

&lt;P&gt;We'll need to get the program in-house to figure out where you're spending your time.&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; - Barry&lt;/P&gt;</description>
      <pubDate>Wed, 04 Feb 2015 13:58:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060745#M53986</guid>
      <dc:creator>Barry_T_Intel</dc:creator>
      <dc:date>2015-02-04T13:58:07Z</dc:date>
    </item>
    <item>
      <title>Thanks. Let me work some more</title>
      <link>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060746#M53987</link>
      <description>&lt;P&gt;Thanks. Let me work some more and have vacation next week before &lt;I&gt;I give you a binary if needed.&lt;/I&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Feb 2015 07:43:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/cilkrts-hyperobject-dealloc-is-expensive/m-p/1060746#M53987</guid>
      <dc:creator>erling_andersen</dc:creator>
      <dc:date>2015-02-05T07:43:46Z</dc:date>
    </item>
  </channel>
</rss>

