- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your help in advance.
Hello, even though the STREAM benchmark has been the de facto benchmark for a long time, I recently started using it and studying it. Not really sure if this is the right place to ask a question about it, since Dr. Bandwdth and other STREAM pros answers here from time to time, please let me ask mine.
This is my desktop information:
- I7-8700 (6c, 12t), 16GB memory, DDR4-2666
- Windows 10
- Using visual studio 2019 (optimization level 2)
- OMP enabled
Now my question starts. I got a binary package of STREAM for Windows, and I tested by differing OMP_NUM_THREADS option from 1 to 12. The graph below is the result:
It is not really what I learned from college. There can be a bottleneck in job distribution or data sharing, I believe the result shouldn't be like this. Fortunately, the package includes not only the binary but also the source code, I compiled it and run it on my own. and the result is quite different.
Sorry for the disagreeing x-axis. Even though the lines are fluctuating, I believe this is nearer to my knowledge (since the graph is going up till 6, which is the number of physical cores).
I know the version that I'm using is an obsolete one(5.8). However, as far as I checked 5.8 and 5.10 were not that much different in key parts, also I followed the STREAM_ARRAY_SIZE rules. What am I missing? and what have I done wrong? I wish someone can explain to me why the first graph is showing that downward result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Windows executable was donated -- I have never had access to a "real" compiler on a Windows system -- so I don't have any of the details on how it was compiled. Note that the file you downloaded was from the "Obsolete" sub-directory -- I will probably nuke most of that in the next web site update....
Except for the issue of streaming stores (*), the default version of STREAM is mostly insensitive to compiler technology on modern processors. Versions of STREAM in C that use dynamic memory allocation often require either a smart compiler or additional annotation (like the "restrict" keyword) so the compiler can assume that there is no aliasing. (That is the main reason that the default version of stream.c uses static global array declarations.)
Depending on the hardware, STREAM performance can be dependent on the relative alignment of the arrays. Different compilers will often generate different alignments, causing small (~1%-2%) differences in performance. These differences usually disappear if you look at the statistics of performance across an ensemble of array sizes.
(*) Streaming stores are an implementation-dependent feature that allow full-cacheline stores that miss in the cache to be written (more-or-less) directly to memory. This bypasses the initial read of the target cache line that is required with the normal store instructions that miss in the cache, leaving more DRAM bandwidth available for the required read and write operations. The Intel compilers will generate streaming store instructions automagically when appropriate, while the GNU compilers do not generate streaming store instructions. I don't think that the CLANG/LLVM combination generates streaming stores, but I have not done much work with these.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recommend trying the Intel Memory Latency Checker. https://software.intel.com/content/www/us/en/develop/articles/intelr-memory-latency-checker.html
The performance numbers here are consistent with having only one populated DRAM channel...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply!
However, what I was wondering was the different result between (.exe in the package) vs (.exe made by the code in the package).
Is there any other recommendation such as to use a specific compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Windows executable was donated -- I have never had access to a "real" compiler on a Windows system -- so I don't have any of the details on how it was compiled. Note that the file you downloaded was from the "Obsolete" sub-directory -- I will probably nuke most of that in the next web site update....
Except for the issue of streaming stores (*), the default version of STREAM is mostly insensitive to compiler technology on modern processors. Versions of STREAM in C that use dynamic memory allocation often require either a smart compiler or additional annotation (like the "restrict" keyword) so the compiler can assume that there is no aliasing. (That is the main reason that the default version of stream.c uses static global array declarations.)
Depending on the hardware, STREAM performance can be dependent on the relative alignment of the arrays. Different compilers will often generate different alignments, causing small (~1%-2%) differences in performance. These differences usually disappear if you look at the statistics of performance across an ensemble of array sizes.
(*) Streaming stores are an implementation-dependent feature that allow full-cacheline stores that miss in the cache to be written (more-or-less) directly to memory. This bypasses the initial read of the target cache line that is required with the normal store instructions that miss in the cache, leaving more DRAM bandwidth available for the required read and write operations. The Intel compilers will generate streaming store instructions automagically when appropriate, while the GNU compilers do not generate streaming store instructions. I don't think that the CLANG/LLVM combination generates streaming stores, but I have not done much work with these.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page