- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12/23/2016 The article "Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen9" was published by Jeffrey McAllister.
Link is here: https://www.intel.com/content/www/us/en/developer/articles/technical/sgemm-ocl-opt.html
At the end of this article, the link of the sample code is no longer available. Is it possible to locate the sample code in this article?
Thanks a lot!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel Communities.
We are working on your issue internally and will get back to you soon.
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I don't know if this would be helpful:
GitHub - ek9852/intel-gemm: General matrix-matrix multiplication in OpenCL from Intel
I don't know if this applies fully to a Gen9 since it has been there for a long time.
In the meantime, if you want to know how to optimize the matrix multiplication generally. I may help to point you to some useful materials.
Best wishes,
Jinchuan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hello Jinchuan,
Thanks for your reply. The reason I'm looking for this old example is that it uses intel's "subgroup shuffle" feature to share data within work-items. This feature extremly improve the performance for matrix multiply. But, I searched online for several days and cannot find a ready-to-go example of "subgroup shuffle" application, specially for matrix multiply. If you can give an example, that will be very helpful. Thanks a lot!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Scout,
try reaching them via linkedin:
Lingyi Kong is a Software Engineer at Intel’s IT Flex Services Group. He is an expert in GPU programming and optimization, and also has Graphics driver/runtime development experience on Intel® Iris and Intel® Iris Pro Graphics.
Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and a third video on Nested Parallelism.
https://www.codeproject.com/Articles/994769/SGEMM-for-Intel-Processor-Graphics
Best wishes,
Jinchuan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The page was re-directed to the newer page as the content of the old article was not relevant anymore. Please refer to the new link:
Thanks & Regards,
Hemanth.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
As the issue is resolved we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.
Regards,
Vidya.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page