- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** STREAM benchmark test results for several Intel architectures ***
This is a thread for tests results using STREAM benchmark for several Intel architectures.
Results for Ivy Bridge posted first.
Results for KNL will be posted some time later.
Link Copied
- « Previous
-
- 1
- 2
- Next »
27 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is how that timing function was modified:
/* A gettimeofday routine to give access to the wall clock timer on most UNIX-like systems. */ double mysecond( void ) { double value; #ifdef _USE_CUSTOM_RDTSC value = Rdtsc(); #else struct timeval tp; struct timezone tzp; int i = gettimeofday( &tp, &tzp ); value = ( ( double )tp.tv_sec + ( double )tp.tv_usec * 1.e-6 ); #endif #ifdef _USE_CUSTOM_RDTSC return ( double )value * dTscSS; #else return ( double )value; #endif }
I also verified results for both implementations of mysecond function:
Unmodified:
Function Best Rate MB/s Avg time Min time Max time Copy: 17475.6 0.025130 0.024001 0.036002 Scale: 11650.2 0.036292 0.036002 0.037003 Add: 13106.7 0.049422 0.048002 0.051003 Triad: 12839.2 0.049809 0.049002 0.058004
Modified:
Function Best Rate MB/s Avg time Min time Max time Copy: 17476.3 0.026742 0.024000 0.037000 Scale: 11650.9 0.036226 0.036000 0.038000 Add: 12839.8 0.049387 0.049000 0.051000 Triad: 12839.8 0.051161 0.049000 0.066000
and they are consistent.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On average 17 GB/s ( for copy operation ) and it is ~35% lower than a theoretical 25 GB/s for an Ivy Bridge system.
I've added explicitly a command line option -mavx for MinGW C++ compiler and results on average 22 GB/s ( for copy operation ) and it is ~15% lower than a theoretical memory bandwidth.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 4 bytes per array element. ------------------------------------------------------------- Array size = 134217728 (elements), Offset = 0 (elements) Memory per array = 512.0 MiB (= 0.5 GiB). Total memory required = 1536.0 MiB (= 1.5 GiB). Each kernel will be executed 128 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 4 Number of Threads counted = 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 15599 microseconds. Each test below will take on the order of 62399 microseconds. (= 4 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Best Rate MB/s Avg time Min time Max time Copy: 22943.3 0.067436 0.046800 0.093601 Scale: 13765.9 0.098022 0.078000 0.124801 Add: 12905.6 0.132785 0.124800 0.171601 Triad: 12905.6 0.132784 0.124800 0.171601 ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-006 on all three arrays -------------------------------------------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergey,
Thanks for sharing all these results. Are you sharing the results of the following STREAM benchmark: https://www.cs.virginia.edu/stream/?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...Are you sharing the results of the following STREAM benchmark: https://www.cs.virginia.edu/stream/?
Yes, and C codes source file was downloaded from: http://www.cs.virginia.edu/stream/FTP/Code/stream.c.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>...
>>Scale: 13765.9 0.098022 0.078000 0.124801
>>Add: 12905.6 0.132785 0.124800 0.171601
>>Triad: 12905.6 0.132784 0.124800 0.171601
>>...
I don't think that measurements for Scale, Add and Triad tests should be done in GB/s and they need to be measured in FLOPS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sergey,
Thank you for the quick response and for sharing the link with the C source files.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »