- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
12/23/2016 The article "Optimizing Matrix Multiply for Intel® Processor Graphics Architecture Gen9" was published by Jeffrey McAllister.
Link is here: https://www.intel.com/content/www/us/en/developer/articles/technical/sgemm-ocl-opt.html
At the end of this article, the link of the sample code is no longer available. Is it possible to locate the sample code in this article?
Thanks a lot!
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
Thanks for posting in Intel Communities.
We are working on your issue internally and will get back to you soon.
Thanks & Regards,
Hemanth
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
I don't know if this would be helpful:
GitHub - ek9852/intel-gemm: General matrix-matrix multiplication in OpenCL from Intel
I don't know if this applies fully to a Gen9 since it has been there for a long time.
In the meantime, if you want to know how to optimize the matrix multiplication generally. I may help to point you to some useful materials.
Best wishes,
Jinchuan
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
hello Jinchuan,
Thanks for your reply. The reason I'm looking for this old example is that it uses intel's "subgroup shuffle" feature to share data within work-items. This feature extremly improve the performance for matrix multiply. But, I searched online for several days and cannot find a ready-to-go example of "subgroup shuffle" application, specially for matrix multiply. If you can give an example, that will be very helpful. Thanks a lot!
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Scout,
try reaching them via linkedin:
Lingyi Kong is a Software Engineer at Intel’s IT Flex Services Group. He is an expert in GPU programming and optimization, and also has Graphics driver/runtime development experience on Intel® Iris and Intel® Iris Pro Graphics.
Robert Ioffe is a Technical Consulting Engineer at Intel’s Software and Solutions Group. He is an expert in OpenCL programming and OpenCL workload optimization on Intel Iris and Intel Iris Pro Graphics with deep knowledge of Intel Graphics Hardware. He was heavily involved in Khronos standards work, focusing on prototyping the latest features and making sure they can run well on Intel architecture. Most recently he has been working on prototyping Nested Parallelism (enqueue_kernel functions) feature of OpenCL 2.0 and wrote a number of samples that demonstrate Nested Parallelism functionality, including GPU-Quicksort for OpenCL 2.0. He also recorded and released two Optimizing Simple OpenCL Kernels videos and a third video on Nested Parallelism.
https://www.codeproject.com/Articles/994769/SGEMM-for-Intel-Processor-Graphics
Best wishes,
Jinchuan
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
The page was re-directed to the newer page as the content of the old article was not relevant anymore. Please refer to the new link:
Thanks & Regards,
Hemanth.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi,
As the issue is resolved we are closing this thread. Please post a new question if you need any additional assistance from Intel as this thread will no longer be monitored.
Regards,
Vidya.