- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm interested to get more details on the performance characteristics of the function cblas_gemm_s16s16s32. In my application, the performance gain over cblas_sgemm is lower than I would hope.
Here is my test configuration, which is larger than what would typically be used in my application (a seq2seq model):
- CblasRowMajor
- M = 1024 ; K = 512; N = 2048
- TRANS_A = FALSE ; TRANS_B = TRUE
- Memory alignment: 64 bytes
And here are some single threaded results on a Intel(R) Core(TM) i7-6700K (AVX2), averaged over 1000 samples:
- cblas_sgemm: 17.7135 ms
- cblas_gemm_s16s16s32: 15.5617 ms
Are these values expected? Do I need to do something specific to get more performance out of cblas_gemm_s16s16s32?
Thanks,
Guillaume
Link Copied
0 Replies

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page