<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Quote:Eric O. wrote: in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044872#M47478</link>
    <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Eric O. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks for clarifying.&amp;nbsp; If I understand correctly, variable length arrays defined like&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;int n=atoi(argv[1]);
double x&lt;N&gt;;
&lt;/N&gt;&lt;/PRE&gt;

&lt;P&gt;are always allocated on the stack in regular C, while in Cilkplus it appears generic/cilk-abi-vla.c always allocates such arrays on the heap and the extra code in x86/cilk-abi-vla.c is designed to allocate variable length arrays on the stack if possible and only uses the heap if necessary.&amp;nbsp; Is this correct?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 17.7381820678711px;"&gt;The first version of the parallel recursive FFT that I wrote actually did allocate variable-length temporary arrays in the recursively cilk_spawn'ed subroutine and appeared to work as expected on the Raspberry Pi 2B.&amp;nbsp; Note, however, for performance and memory use reasons the parallel recursive FFT test that I posted on the Raspberry Pi forum does not allocate variable length arrays.&amp;nbsp; This gives me ideas what more should be tested in the current ARMv7 build.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 17.7381820678711px;"&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;DIV&gt;
	&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 17.7381820678711px;"&gt;I did not work on that section of code, but I believe you are correct. &amp;nbsp;Allocating the array on the stack is likely faster than putting in on the heap, but it requires additional work to figure out what to do. &amp;nbsp;&amp;nbsp;Note also, that the tricky case occurs when the variable-length array is declared in a continuation (i.e., after a cilk_spawn, but before the cilk_sync), because the continuation may execute on a different stack. &amp;nbsp; A variable-length array that is declared at the beginning of the function might "just work" if it is pushed onto the stack before the first cilk_spawn. &amp;nbsp;&amp;nbsp;But you shouldn't quote me on that, since it probably depends on implementation details that I am not familiar with.&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;As far as testing on non-x86 architectures goes, the places that I might expect the most potential issues would be in the work-stealing / synchronization sections of the code (e.g., the THE protocol, __cilkrts_leave_frame, etc.), since those are places where a difference in memory model might introduce bugs. &amp;nbsp; Some stress-tests on steals might be more likely to reveal some of those kinds of issues if they exist.&lt;/P&gt;
&lt;/DIV&gt;

&lt;DIV&gt;Cheers,&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;Jim&lt;/DIV&gt;</description>
    <pubDate>Wed, 29 Apr 2015 00:57:21 GMT</pubDate>
    <dc:creator>Jim_S_Intel</dc:creator>
    <dc:date>2015-04-29T00:57:21Z</dc:date>
    <item>
      <title>Cilkplus port to Raspberry Pi 2B</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044867#M47473</link>
      <description>&lt;P&gt;I compiled the new release of gcc-5.1 with the Cilkplus parallel processing extensions and runtime library for ARMv7 architecture on the Raspberry Pi 2B single board computer.&amp;nbsp; Two changes were needed.&lt;/P&gt;

&lt;P&gt;The first change corrects a typo in generic/cilk-abi-vla.c by changing the second to the last line of the file from&lt;/P&gt;

&lt;P&gt;vla_internal_heap_free(t, full_size);&lt;/P&gt;

&lt;P&gt;to&lt;/P&gt;

&lt;P&gt;vla_internal_heap_free(p, full_size);&lt;/P&gt;

&lt;P&gt;the second change was to generic/os-fence.c and ARM specific. Comment out the line&lt;/P&gt;

&lt;P&gt;COMMON_SYSDEP void __cilkrts_fence(void); ///&amp;lt; MFENCE instruction&lt;/P&gt;

&lt;P&gt;as&lt;/P&gt;

&lt;P&gt;// COMMON_SYSDEP void __cilkrts_fence(void); ///&amp;lt; MFENCE instruction&lt;/P&gt;

&lt;P&gt;and then add the define&lt;/P&gt;

&lt;P&gt;#define __cilkrts_fence() __asm__ volatile ("DSB")&lt;/P&gt;

&lt;P&gt;right above it.&amp;nbsp; I've been testing the results and getting reasonable parallel speedup using 4-cores on a number of algorithms.&amp;nbsp; My results are posted on the Raspberry Pi forum under the topic "Programming C/C++" in the thread "Cilkplus on RPi2B."&lt;/P&gt;

&lt;P&gt;It appears that cilk_spawn, cilk_sync and cilk_for are running without errors; however, I've not optimized the stack swapping code in generic as has been done for Intel architecture CPUs.&lt;/P&gt;

&lt;P&gt;Is anyone working on this?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Apr 2015 22:47:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044867#M47473</guid>
      <dc:creator>Eric_O_</dc:creator>
      <dc:date>2015-04-27T22:47:12Z</dc:date>
    </item>
    <item>
      <title>Just fyi, the "cilk-abi-vla.c</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044868#M47474</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Just fyi, the "&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 16.3636360168457px;"&gt;cilk-abi-vla.c" file has to do with supporting variable-length arrays (VLAs) within a Cilk Plus (spawning) function, and not the generic stack switching done by the runtime. &amp;nbsp; If your code does not use VLAs in a spawning function (which most don't), then those functions should not be called. &amp;nbsp;&amp;nbsp;Moreover, I believe VLAs require compiler support, so I think you'd have to double-check whether the compiler is generating code for VLAs or not.&lt;/SPAN&gt;&lt;BR /&gt;
	&lt;BR /&gt;
	&lt;SPAN style="font-size: 12px; line-height: 16.3636360168457px;"&gt;Last I recall, there were no architecture-specific optimizations in the runtime for switching stacks. &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;But Cilk Plus runtime development&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 16.3636360168457px;"&gt;is no longer my primary role, so&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 16.3636360168457px;"&gt;&amp;nbsp;o&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 16.3636360168457px;"&gt;thers may or may not have more up-to-date information. &amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 16.3636360168457px;"&gt;The original theory and design behind Cilk was to make the spawns cheap, at the cost of more expensive steals, since steals are supposed to be rare. &amp;nbsp; Thus, I don't think optimizing the stack switching for a particular architecture is necessarily going to provide a large payoff.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Cheers&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Jim&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Apr 2015 23:59:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044868#M47474</guid>
      <dc:creator>Jim_S_Intel</dc:creator>
      <dc:date>2015-04-27T23:59:57Z</dc:date>
    </item>
    <item>
      <title>Hi Eric,</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044869#M47475</link>
      <description>&lt;P&gt;Hi Eric,&lt;/P&gt;

&lt;P&gt;Thank you for sharing this information!&lt;/P&gt;

&lt;P&gt;Could you submit your contribution to&amp;nbsp;https://www.cilkplus.org/submit-cilk-contribution, so that it is included in the cilkplus source package (then, in GCC)?&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2015 16:02:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044869#M47475</guid>
      <dc:creator>Hansang_B_Intel</dc:creator>
      <dc:date>2015-04-28T16:02:10Z</dc:date>
    </item>
    <item>
      <title>Quote:Jim Sukha (Intel) wrote</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044870#M47476</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Jim Sukha (Intel) wrote:&lt;BR /&gt;Just fyi, the "cilk-abi-vla.c" file has to do with supporting variable-length arrays (VLAs) within a Cilk Plus (spawning) function, and not the generic stack switching done by the runtime.&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thanks for clarifying.&amp;nbsp; If I understand correctly, variable length arrays defined like&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;int n=atoi(argv[1]);
double x&lt;N&gt;;
&lt;/N&gt;&lt;/PRE&gt;

&lt;P&gt;are always allocated on the stack in regular C, while in Cilkplus it appears generic/cilk-abi-vla.c always allocates such arrays on the heap and the extra code in x86/cilk-abi-vla.c is designed to allocate variable length arrays on the stack if possible and only uses the heap if necessary.&amp;nbsp; Is this correct?&lt;/P&gt;

&lt;P&gt;The first version of the parallel recursive FFT that I wrote actually did allocate variable-length temporary arrays in the recursively cilk_spawn'ed subroutine and appeared to work as expected on the Raspberry Pi 2B.&amp;nbsp; Note, however, for performance and memory use reasons the parallel recursive FFT test that I posted on the Raspberry Pi forum does not allocate variable length arrays.&amp;nbsp; This gives me ideas what more should be tested in the current ARMv7 build.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2015 17:54:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044870#M47476</guid>
      <dc:creator>Eric_O_</dc:creator>
      <dc:date>2015-04-28T17:54:48Z</dc:date>
    </item>
    <item>
      <title>Quote:HANSANG B. (Intel)</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044871#M47477</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;HANSANG B. (Intel) wrote:&lt;BR /&gt;Thank you for sharing this information!&amp;nbsp; Could you submit your contribution to&amp;nbsp;&lt;A href="https://www.cilkplus.org/submit-cilk-contribution"&gt;https://www.cilkplus.org/submit-cilk-contribution&lt;/A&gt;, so that it is included in the cilkplus source package (then, in GCC)?&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Currently my patch just changes two lines of source in libcilkrts/config/generic rather than properly creating a new architecture subdirectory such as libcilkrts/config/arm for the arm specific changes.&amp;nbsp; After this is polished into a proper patch, I'll send it in.&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2015 18:03:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044871#M47477</guid>
      <dc:creator>Eric_O_</dc:creator>
      <dc:date>2015-04-28T18:03:29Z</dc:date>
    </item>
    <item>
      <title>Quote:Eric O. wrote:</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044872#M47478</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Eric O. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Thanks for clarifying.&amp;nbsp; If I understand correctly, variable length arrays defined like&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;int n=atoi(argv[1]);
double x&lt;N&gt;;
&lt;/N&gt;&lt;/PRE&gt;

&lt;P&gt;are always allocated on the stack in regular C, while in Cilkplus it appears generic/cilk-abi-vla.c always allocates such arrays on the heap and the extra code in x86/cilk-abi-vla.c is designed to allocate variable length arrays on the stack if possible and only uses the heap if necessary.&amp;nbsp; Is this correct?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 17.7381820678711px;"&gt;The first version of the parallel recursive FFT that I wrote actually did allocate variable-length temporary arrays in the recursively cilk_spawn'ed subroutine and appeared to work as expected on the Raspberry Pi 2B.&amp;nbsp; Note, however, for performance and memory use reasons the parallel recursive FFT test that I posted on the Raspberry Pi forum does not allocate variable length arrays.&amp;nbsp; This gives me ideas what more should be tested in the current ARMv7 build.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="font-size: 13.0080003738403px; line-height: 17.7381820678711px;"&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;DIV&gt;
	&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 17.7381820678711px;"&gt;I did not work on that section of code, but I believe you are correct. &amp;nbsp;Allocating the array on the stack is likely faster than putting in on the heap, but it requires additional work to figure out what to do. &amp;nbsp;&amp;nbsp;Note also, that the tricky case occurs when the variable-length array is declared in a continuation (i.e., after a cilk_spawn, but before the cilk_sync), because the continuation may execute on a different stack. &amp;nbsp; A variable-length array that is declared at the beginning of the function might "just work" if it is pushed onto the stack before the first cilk_spawn. &amp;nbsp;&amp;nbsp;But you shouldn't quote me on that, since it probably depends on implementation details that I am not familiar with.&lt;/SPAN&gt;&lt;/P&gt;

	&lt;P&gt;As far as testing on non-x86 architectures goes, the places that I might expect the most potential issues would be in the work-stealing / synchronization sections of the code (e.g., the THE protocol, __cilkrts_leave_frame, etc.), since those are places where a difference in memory model might introduce bugs. &amp;nbsp; Some stress-tests on steals might be more likely to reveal some of those kinds of issues if they exist.&lt;/P&gt;
&lt;/DIV&gt;

&lt;DIV&gt;Cheers,&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;Jim&lt;/DIV&gt;</description>
      <pubDate>Wed, 29 Apr 2015 00:57:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044872#M47478</guid>
      <dc:creator>Jim_S_Intel</dc:creator>
      <dc:date>2015-04-29T00:57:21Z</dc:date>
    </item>
    <item>
      <title>We worked with the Intel</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044873#M47479</link>
      <description>&lt;P&gt;We worked with the Intel compiler developers to add support for Variable Length Arrays in spawning functions, so the Intel compiler knows to call __cilkrts_stack_alloc() and __cilkrts_stack_free() to allocate and delete a VLA. This allows the Cilk runtime to expand the stack for the VLA if possible, or allocate it on the heap if necessary.&lt;/P&gt;

&lt;P&gt;I don't believe that the GCC implementation of VLAs knows anything about spawning functions, so use of VLAs in a spawning function is currently not supported in GCC. Which is why I just threw together the generic implementations of the functions - they never get called. I figured we'd flesh them out when we added VLA support in spawning functions to GCC. I guess that time is now. :o)&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; - Barry&lt;/P&gt;</description>
      <pubDate>Wed, 29 Apr 2015 13:28:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044873#M47479</guid>
      <dc:creator>Barry_T_Intel</dc:creator>
      <dc:date>2015-04-29T13:28:00Z</dc:date>
    </item>
    <item>
      <title>This message is to indicate</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044874#M47480</link>
      <description>&lt;P&gt;This message is to indicate that I've just created a new patch for gcc-5.2 to support Cilk on Raspberry Pi.&amp;nbsp; The patch is now cleaner in the sense that it creates a new directory config/arm which contains the architecture specific files in a way similar to config/x86.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://fractal.math.unr.edu/~ejolson/patches/gcc-5.2.0-ejo.patch" target="_blank"&gt;http://fractal.math.unr.edu/~ejolson/patches/gcc-5.2.0-ejo.patch&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Unfortunately the above patch also contains a one-line change to the cpp preprocessor to enable UTF-8 in C identifiers.&amp;nbsp; Fortunately this change for UTF-8 support appears at the end of the patch and is easy to remove.&amp;nbsp; Note also that the directory config/arm needs to be created and a few files copied before applying the patch.&amp;nbsp; Exact details how to apply the patch and build a working compiler are provided at&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.raspberrypi.org/forums/viewtopic.php?p=802657" target="_blank"&gt;https://www.raspberrypi.org/forums/viewtopic.php?p=802657&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The patch has been tested and works with ARMv6 of the original Raspberry Pi and with ARMv7 of the new Raspberry Pi 2B.&amp;nbsp; Hopefully this is enough to get the ARM Cilkplus patch into mainline gcc for the next release.&amp;nbsp; Please let me know if anything else is required.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Aug 2015 23:14:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044874#M47480</guid>
      <dc:creator>Eric_O_</dc:creator>
      <dc:date>2015-08-14T23:14:17Z</dc:date>
    </item>
    <item>
      <title>Quote:Eric O. wrote:</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044875#M47481</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Eric O. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;This message is to indicate that I've just created a new patch for gcc-5.2 to support Cilk on Raspberry Pi.&amp;nbsp; The patch is now cleaner in the sense that it creates a new directory config/arm which contains the architecture specific files in a way similar to config/x86.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;A href="http://fractal.math.unr.edu/~ejolson/patches/gcc-5.2.0-ejo.patch"&gt;http://fractal.math.unr.edu/~ejolson/patches/gcc-5.2.0-ejo.patch&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Unfortunately the above patch also contains a one-line change to the cpp preprocessor to enable UTF-8 in C identifiers.&amp;nbsp; Fortunately this change for UTF-8 support appears at the end of the patch and is easy to remove.&amp;nbsp; Note also that the directory config/arm needs to be created and a few files copied before applying the patch.&amp;nbsp; Exact details how to apply the patch and build a working compiler are provided at&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.raspberrypi.org/forums/viewtopic.php?p=802657"&gt;https://www.raspberrypi.org/forums/viewtopic.php?p=802657&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The patch has been tested and works with ARMv6 of the original Raspberry Pi and with ARMv7 of the new Raspberry Pi 2B.&amp;nbsp; Hopefully this is enough to get the ARM Cilkplus patch into mainline gcc for the next release.&amp;nbsp; Please let me know if anything else is required.&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Thank you for sharing your contribution!&lt;/P&gt;

&lt;P&gt;It might take some time for your contribution to be part of GCC mainline, but it will happen eventually.&lt;/P&gt;</description>
      <pubDate>Mon, 17 Aug 2015 13:45:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044875#M47481</guid>
      <dc:creator>Hansang_B_Intel</dc:creator>
      <dc:date>2015-08-17T13:45:54Z</dc:date>
    </item>
    <item>
      <title>This is Eric who started this</title>
      <link>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044876#M47482</link>
      <description>&lt;P&gt;This is Eric who started this thread. &amp;nbsp;It's been some time since I tried to post and somehow I could not login or reactivate my old account. &amp;nbsp;I recently compiled gcc-7.1 on ARM and am testing on an 8-core SBC based on the Samsung/Nexell S5P6818. &amp;nbsp;First off, I'm happy to see that ARM architecture is now recognized and no patching or fiddling with configuration files is necessary. &amp;nbsp;At the same time, there appears to be a cilkplus performance regression of about 40 percent slower since gcc-5.2. &amp;nbsp;Note that this regression doesn't affect the non-cilkplus version of the code, which still runs the same speed, nor does it affect gcc-7.1 cilkplus running on 64-bit Intel. &amp;nbsp;I'm posting this quick message to check whether 40 percent poorer performance of cilkplus on ARM with gcc-7.1 versus gcc-5.2 is well known and what the cause might be. &amp;nbsp;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 07 Aug 2017 04:50:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Cilkplus-port-to-Raspberry-Pi-2B/m-p/1044876#M47482</guid>
      <dc:creator>ejol</dc:creator>
      <dc:date>2017-08-07T04:50:07Z</dc:date>
    </item>
  </channel>
</rss>

