GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
85 Discussions

Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen9

Scout
Beginner
2,850 Views

12/23/2016  The article "Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen9" was published by Jeffrey McAllister.

Link is here: https://www.intel.com/content/www/us/en/developer/articles/technical/sgemm-ocl-opt.html

At the end of this article, the link of the sample code is no longer available. Is it possible to locate the sample code in this article?

 

Thanks a lot!

0 Kudos
6 Replies
HemanthCH_Intel
Moderator
2,824 Views

Hi,


Thanks for posting in Intel Communities.


We are working on your issue internally and will get back to you soon.


Thanks & Regards,

Hemanth


Jinchuan_Tang
Beginner
2,556 Views

Hi, 

I don't know if this would be helpful:

GitHub - ek9852/intel-gemm: General matrix-matrix multiplication in OpenCL from Intel

I don't know if this applies fully to a Gen9 since it has been there for a long time.

In the meantime, if you want to know how to optimize the matrix multiplication generally. I may help to point you to some useful materials.

 

Best wishes,

Jinchuan

Scout
Beginner
2,266 Views

hello Jinchuan,

     Thanks for your reply. The reason I'm looking for this old example is that it uses intel's "subgroup shuffle" feature to share data within work-items. This feature extremly improve the performance for matrix multiply. But, I searched online for several days and cannot find a ready-to-go example of "subgroup shuffle" application, specially for matrix multiply. If you can give an example, that will be very helpful. Thanks a lot!

Jinchuan_Tang
Beginner
2,241 Views

Hi Scout,

 

try reaching them via linkedin:

Lingyi Kong is a Software Engineer at Intel’s IT Flex Services Group. He is an expert in GPU programming and optimization, and also has Graphics driver/runtime development experience on Intel® Iris and Intel® Iris Pro Graphics.

Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and a third video on Nested Parallelism.

 

https://www.codeproject.com/Articles/994769/SGEMM-for-Intel-Processor-Graphics

Best wishes,

Jinchuan

HemanthCH_Intel
Moderator
2,211 Views

Hi,


The page was re-directed to the newer page as the content of the old article was not relevant anymore. Please refer to the new link:

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide/top/kernels.html


Thanks & Regards,

Hemanth.



VidyalathaB_Intel
Moderator
2,137 Views

Hi,


As the issue is resolved we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.


Regards,

Vidya.


Reply