STREAM benchmark test results for several Intel architectures - Page 2

SergeyKostrov · ‎04-27-2017

*** STREAM benchmark test results for several Intel architectures *** This is a thread for tests results using STREAM benchmark for several Intel architectures. Results for Ivy Bridge posted first. Results for KNL will be posted some time later.

SergeyKostrov · ‎04-27-2017

This is how that timing function was modified:

/* A gettimeofday routine to give access to the wall
   clock timer on most UNIX-like systems. */

double mysecond( void )
{
 double value;

 #ifdef _USE_CUSTOM_RDTSC
 value = Rdtsc();
 #else
 struct timeval tp;
 struct timezone tzp;
 int i = gettimeofday( &tp, &tzp );
 value = ( ( double )tp.tv_sec + ( double )tp.tv_usec * 1.e-6 );
 #endif

 #ifdef _USE_CUSTOM_RDTSC
 return ( double )value * dTscSS;
 #else
 return ( double )value;
 #endif
}

I also verified results for both implementations of mysecond function:

Unmodified:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           17475.6     0.025130     0.024001     0.036002
Scale:          11650.2     0.036292     0.036002     0.037003
Add:            13106.7     0.049422     0.048002     0.051003
Triad:          12839.2     0.049809     0.049002     0.058004

Modified:

Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           17476.3     0.026742     0.024000     0.037000
Scale:          11650.9     0.036226     0.036000     0.038000
Add:            12839.8     0.049387     0.049000     0.051000
Triad:          12839.8     0.051161     0.049000     0.066000

and they are consistent.

SergeyKostrov · ‎04-28-2017

On average 17 GB/s ( for copy operation ) and it is ~35% lower than a theoretical 25 GB/s for an Ivy Bridge system. I've added explicitly a command line option -mavx for MinGW C++ compiler and results on average 22 GB/s ( for copy operation ) and it is ~15% lower than a theoretical memory bandwidth.

SergeyKostrov · ‎04-28-2017

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 4 bytes per array element.
-------------------------------------------------------------
Array size = 134217728 (elements), Offset = 0 (elements)
Memory per array = 512.0 MiB (= 0.5 GiB).
Total memory required = 1536.0 MiB (= 1.5 GiB).
Each kernel will be executed 128 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 15599 microseconds.
Each test below will take on the order of 62399 microseconds.
   (= 4 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           22943.3     0.067436     0.046800     0.093601
Scale:          13765.9     0.098022     0.078000     0.124801
Add:            12905.6     0.132785     0.124800     0.171601
Triad:          12905.6     0.132784     0.124800     0.171601
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-006 on all three arrays
-------------------------------------------------------------

gaston-hillar · ‎04-30-2017

Hi Sergey,

Thanks for sharing all these results. Are you sharing the results of the following STREAM benchmark: https://www.cs.virginia.edu/stream/?

SergeyKostrov · ‎05-01-2017

>>...Are you sharing the results of the following STREAM benchmark: https://www.cs.virginia.edu/stream/? Yes, and C codes source file was downloaded from: http://www.cs.virginia.edu/stream/FTP/Code/stream.c.

SergeyKostrov · ‎05-01-2017

>>... >>Scale: 13765.9 0.098022 0.078000 0.124801 >>Add: 12905.6 0.132785 0.124800 0.171601 >>Triad: 12905.6 0.132784 0.124800 0.171601 >>... I don't think that measurements for Scale, Add and Triad tests should be done in GB/s and they need to be measured in FLOPS.

gaston-hillar · ‎05-01-2017

@Sergey,

Thank you for the quick response and for sharing the link with the C source files.