GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
165 Discussions

Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen9

Scout
Beginner
4,109 Views

12/23/2016  The article "Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen9" was published by Jeffrey McAllister.

Link is here: https://www.intel.com/content/www/us/en/developer/articles/technical/sgemm-ocl-opt.html

At the end of this article, the link of the sample code is no longer available. Is it possible to locate the sample code in this article?

 

Thanks a lot!

0 Kudos
6 Replies
HemanthCH_Intel
Moderator
4,083 Views

Hi,


Thanks for posting in Intel Communities.


We are working on your issue internally and will get back to you soon.


Thanks & Regards,

Hemanth


0 Kudos
Jinchuan_Tang
Beginner
3,815 Views

Hi, 

I don't know if this would be helpful:

GitHub - ek9852/intel-gemm: General matrix-matrix multiplication in OpenCL from Intel

I don't know if this applies fully to a Gen9 since it has been there for a long time.

In the meantime, if you want to know how to optimize the matrix multiplication generally. I may help to point you to some useful materials.

 

Best wishes,

Jinchuan

0 Kudos
Scout
Beginner
3,525 Views

hello Jinchuan,

     Thanks for your reply. The reason I'm looking for this old example is that it uses intel's "subgroup shuffle" feature to share data within work-items. This feature extremly improve the performance for matrix multiply. But, I searched online for several days and cannot find a ready-to-go example of "subgroup shuffle" application, specially for matrix multiply. If you can give an example, that will be very helpful. Thanks a lot!

0 Kudos
Jinchuan_Tang
Beginner
3,500 Views

Hi Scout,

 

try reaching them via linkedin:

Lingyi Kong is a Software Engineer at Intel’s IT Flex Services Group. He is an expert in GPU programming and optimization, and also has Graphics driver/runtime development experience on Intel® Iris and Intel® Iris Pro Graphics.

Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and a third video on Nested Parallelism.

 

https://www.codeproject.com/Articles/994769/SGEMM-for-Intel-Processor-Graphics

Best wishes,

Jinchuan

0 Kudos
HemanthCH_Intel
Moderator
3,470 Views

Hi,


The page was re-directed to the newer page as the content of the old article was not relevant anymore. Please refer to the new link:

https://www.intel.com/content/www/us/en/develop/documentation/oneapi-gpu-optimization-guide/top/kernels.html


Thanks & Regards,

Hemanth.



0 Kudos
VidyalathaB_Intel
Moderator
3,396 Views

Hi,


As the issue is resolved we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.


Regards,

Vidya.


0 Kudos
Reply