<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Mike, in Intel® Embree Ray Tracing Kernels</title>
    <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158288#M751</link>
    <description>&lt;P&gt;I agree, a 10X drop is suspicious. Disabling ISPC&amp;nbsp;optimizations should also not impact performance in a positive way.&amp;nbsp;I'll take a look into it. Can you share some more details about the slower hardware?&lt;/P&gt;</description>
    <pubDate>Thu, 09 Apr 2020 11:56:53 GMT</pubDate>
    <dc:creator>FlorianR_Intel</dc:creator>
    <dc:date>2020-04-09T11:56:53Z</dc:date>
    <item>
      <title>Performance issues on older hardware</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158287#M750</link>
      <description>&lt;P&gt;I have an application that performs a large number of&amp;nbsp;rtcPointQueryV() over moderately-sized triangle meshes.&amp;nbsp; I am seeing significant differences in performance depending on the hardware it is being executed on.&amp;nbsp; I'm using&amp;nbsp;Embree 3.8.0 and ISPC 1.12.0.&lt;/P&gt;&lt;P&gt;On my i9 MacBookPro, these jobs execute reliably in a handful of seconds or less (usually much less).&amp;nbsp; Perfectly acceptable performance.&lt;/P&gt;&lt;P&gt;On less capable hardware (e.g. CPUs not reporting avx2 support), the same binary executing the same job&amp;nbsp;can take&amp;nbsp;well over a minute to execute.&amp;nbsp; I certainly expected some slowdown but a &amp;gt;10X drop&amp;nbsp;seems excessive (but maybe it isn't???).&amp;nbsp; And it isn't consistent -- some jobs&amp;nbsp;are reasonably&amp;nbsp;performant while others seem to get lost somewhere in the rtcPointQuery() calls (for some of these jobs, it making&amp;nbsp;10,000+ calls).&lt;/P&gt;&lt;P&gt;In some cases, dialing the ISPC compiler optimizations down to -O0 actually significantly improved performance, but not in all cases.&lt;/P&gt;&lt;P&gt;I've tried changing the ISPC --target&amp;nbsp;but it didn't seem to make a significant difference.&amp;nbsp; My last debugging iteration&amp;nbsp;targeted avx only.&lt;/P&gt;&lt;P&gt;I'm trying to figure out if this is an ISPC problem, an embree problem, or a _me_&amp;nbsp;problem.&amp;nbsp; Any recommendations on next steps I should take?&amp;nbsp; Or am I just expecting too much out of limited hardware?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Apr 2020 17:14:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158287#M750</guid>
      <dc:creator>Mike_M_6</dc:creator>
      <dc:date>2020-04-08T17:14:31Z</dc:date>
    </item>
    <item>
      <title>Hi Mike,</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158288#M751</link>
      <description>&lt;P&gt;I agree, a 10X drop is suspicious. Disabling ISPC&amp;nbsp;optimizations should also not impact performance in a positive way.&amp;nbsp;I'll take a look into it. Can you share some more details about the slower hardware?&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2020 11:56:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158288#M751</guid>
      <dc:creator>FlorianR_Intel</dc:creator>
      <dc:date>2020-04-09T11:56:53Z</dc:date>
    </item>
    <item>
      <title>The jobs are run as AWS</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158289#M752</link>
      <description>&lt;P&gt;The jobs are run as AWS Lambdas, where /proc/cpuinfo reports:&lt;/P&gt;
&lt;PRE class="brush:; class-name:dark;"&gt;INFO: CPUINFO: 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) Processor @ 2.50GHz
stepping	: 4
microcode	: 0x1
cpu MHz		: 2500.012
cache size	: 33792 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms smap xsaveopt arat md_clear arch_capabilities
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips	: 5000.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) Processor @ 2.50GHz
stepping	: 4
microcode	: 0x1
cpu MHz		: 2500.012
cache size	: 33792 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cpuid_fault pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust smep erms smap xsaveopt arat md_clear arch_capabilities
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips	: 5000.02
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Embree logging reports:&lt;/P&gt;

&lt;PRE class="brush:; class-name:dark;"&gt;Embree Ray Tracing Kernels 3.8.0 ()
Compiler  : GCC 7.3.1 20180712 (Red Hat 7.3.1-6)
Build     : Release 
Platform  : Linux (64bit)
CPU       : Unknown CPU (GenuineIntel)
Threads  : 2
ISA      : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND 
Targets  : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI 
MXCSR    : FTZ=1, DAZ=1
Config
Threads : default
ISA     : XMM YMM SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 POPCNT AVX F16C RDRAND 
Targets : SSE SSE2 SSE3 SSSE3 SSE4.1 SSE4.2 AVX AVXI  (supported)
SSE2 SSE4.2 AVX AVX2  (compile time enabled)
Features: intersection_filter 
Tasking : TBB2019.9 TBB_header_interface_11009 TBB_lib_interface_11009 
general:
build threads      = 0
build user threads = 0
start_threads      = 0
affinity           = 0
frequency_level    = simd256
hugepages          = enabled
verbosity          = 3
cache_size         = 134.218 MB
max_spatial_split_replications = 1.2
triangles:
accel              = default
builder            = default
traverser          = default
motion blur triangles:
accel              = default
builder            = default
traverser          = default
quads:
accel              = default
builder            = default
traverser          = default
motion blur quads:
accel              = default
builder            = default
traverser          = default
line segments:
accel              = default
builder            = default
traverser          = default
motion blur line segments:
accel              = default
builder            = default
traverser          = default
hair:
accel              = default
builder            = default
traverser          = default
motion blur hair:
accel              = default
builder            = default
traverser          = default
subdivision surfaces:
accel              = default
grids:
accel              = default
builder            = default
motion blur grids:
accel              = default
builder            = default
object_accel:
min_leaf_size      = 1
max_leaf_size      = 1
object_accel_mb:
min_leaf_size      = 1
max_leaf_size      = 1
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance...&lt;/P&gt;</description>
      <pubDate>Thu, 09 Apr 2020 13:55:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158289#M752</guid>
      <dc:creator>Mike_M_6</dc:creator>
      <dc:date>2020-04-09T13:55:13Z</dc:date>
    </item>
    <item>
      <title>After further experimentation</title>
      <link>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158290#M753</link>
      <description>&lt;P&gt;After further experimentation, it appears this blog post explains a lot of what I've been seeing:&amp;nbsp;&amp;nbsp;https://engineering.opsgenie.com/how-does-proportional-cpu-allocation-work-with-aws-lambda-41cd44da3cac&lt;/P&gt;&lt;P&gt;Thanks for the response.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Apr 2020 14:20:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Embree-Ray-Tracing-Kernels/Performance-issues-on-older-hardware/m-p/1158290#M753</guid>
      <dc:creator>Mike_M_6</dc:creator>
      <dc:date>2020-04-13T14:20:51Z</dc:date>
    </item>
  </channel>
</rss>

