- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
when I have 100x10 and 10x10 matrices, no multithreading is engaged. with 200x10 and 10x10, multithreading is engaged.
Are there any rules of thumb, also for other procedured than gemm? dcopy, dsctr, dsyrk, dpotri, dsymm, dgthr, daxpy
Aside, I wondered what is the difference between "cblas_dcopy()" and "dcopy()".
Thanks
T
#include
#include
#include
#include
#include
void mttest(double *a, double *b, int *geom, double *c) {
double one = 1.0; double zero = 0;
dgemm("n","n",&geom[0],&geom[3],&geom[1],&one,a,&geom[0],b,&geom[2],&zero,c,&geom[0]);
}
gcc -std=gnu99 -fpic -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -c mttest.c -o mttest.o
gcc -std=gnu99 -shared -L/opt/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm mttest.o -o mttest.so
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
cblas_ wrappers accept value operands where appropriate and conform them with the Fortran default. They are open source code; look for yourself. Most C compilers know how to compile data moves in open C code or
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Amultithreadingwould create a negative impact on overall performance if matrix sizes are too small ( less
then 128x128 )because of someoverhead related tocreation of threads. For example, if two matriceshave to
bemultiplied usingStrassen andClassic algorithms real performance improvements willhappen if sizes greater
than 128x128. Strassen algorithmdoes calculations faster even when onethread is used. I could provide some
real data if needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
...Strassen algorithmdoes calculations faster even when onethread is used. I could provide some
real data if needed.
Here are performance results ( Operation - Matrix multiplication ).
Size of both matrices: 128x128
Matrix Size : 128 x 128
Matrix Size Threshold: N/A
Matrix Partitions : N/A
ResultSets Reflection: N/A
Calculating...
Classic A - Pass 1 - Completed: 0.03100 secs
Classic A - Pass 2 - Completed: 0.03100 secs
Classic A - Pass 3 - Completed: 0.01600 secs
Classic A - Pass 4 - Completed: 0.03100 secs
Classic A - Pass 5 - Completed: 0.01600 secs
Strassen HBI
Matrix Size : 128 x 128
Matrix Size Threshold: 64 x 64
Matrix Partitions : 1
ResultSets Reflection: N/A
Calculating...
Strassen HBI - Pass 1 - Completed: 0.01500 secs
Strassen HBI - Pass 2 - Completed: 0.03100 secs
Strassen HBI - Pass 3 - Completed: 0.01600 secs
Strassen HBI - Pass 4 - Completed: 0.01600 secs
Strassen HBI - Pass 5 - Completed: 0.03100 secs
Strassen HBC
Matrix Size : 128 x 128
Matrix Size Threshold: 8 x 8
Matrix Partitions : 2801
ResultSets Reflection: Enabled
Calculating...
Strassen HBC - Pass 1 - Completed: 0.12500 secs
Strassen HBC - Pass 2 - Completed: 0.03100 secs
Strassen HBC - Pass 3 - Completed: 0.03100 secs
Strassen HBC - Pass 4 - Completed: 0.03200 secs
Strassen HBC - Pass 5 - Completed: 0.01500 secs
Size of both matrices: 256x256
Matrix Size : 256 x 256
Matrix Size Threshold: N/A
Matrix Partitions : N/A
ResultSets Reflection: N/A
Calculating...
Classic A - Pass 1 - Completed: 0.59400 secs
Classic A - Pass 2 - Completed: 0.60900 secs
Classic A - Pass 3 - Completed: 0.59400 secs
Classic A - Pass 4 - Completed: 0.59400 secs
Classic A - Pass 5 - Completed: 0.60900 secs
Strassen HBI
Matrix Size : 256 x 256
Matrix Size Threshold: 128 x 128
Matrix Partitions : 1
ResultSets Reflection: N/A
Calculating...
Strassen HBI - Pass 1 - Completed: 0.17200 secs
Strassen HBI - Pass 2 - Completed: 0.17200 secs
Strassen HBI - Pass 3 - Completed: 0.15600 secs
Strassen HBI - Pass 4 - Completed: 0.17200 secs
Strassen HBI - Pass 5 - Completed: 0.17200 secs
Strassen HBC
Matrix Size : 256 x 256
Matrix Size Threshold: 16 x 16
Matrix Partitions : 2801
ResultSets Reflection: Enabled
Calculating...
Strassen HBC - Pass 1 - Completed: 0.37500 secs
Strassen HBC - Pass 2 - Completed: 0.17200 secs
Strassen HBC - Pass 3 - Completed: 0.17200 secs
Strassen HBC - Pass 4 - Completed: 0.17200 secs
Strassen HBC - Pass 5 - Completed: 0.17200 secs
Size of both matrices: 512x512
Matrix Size : 512 x 512
Matrix Size Threshold: N/A
Matrix Partitions : N/A
ResultSets Reflection: N/A
Calculating...
Classic A - Pass 1 - Completed: 10.81200 secs
Classic A - Pass 2 - Completed: 10.84400 secs
Classic A - Pass 3 - Completed: 10.82800 secs
Classic A - Pass 4 - Completed: 10.82800 secs
Classic A - Pass 5 - Completed: 10.82800 secs
Strassen HBI
Matrix Size : 512 x 512
Matrix Size Threshold: 256 x 256
Matrix Partitions : 1
ResultSets Reflection: N/A
Calculating...
Strassen HBI - Pass 1 - Completed: 1.39100 secs
Strassen HBI - Pass 2 - Completed: 1.37500 secs
Strassen HBI - Pass 3 - Completed: 1.35900 secs
Strassen HBI - Pass 4 - Completed: 1.37500 secs
Strassen HBI - Pass 5 - Completed: 1.37500 secs
Strassen HBC
Matrix Size : 512 x 512
Matrix Size Threshold: 32 x 32
Matrix Partitions : 2801
ResultSets Reflection: Enabled
Calculating...
Strassen HBC - Pass 1 - Completed: 1.12500 secs
Strassen HBC - Pass 2 - Completed: 0.65600 secs
Strassen HBC - Pass 3 - Completed: 0.64100 secs
Strassen HBC - Pass 4 - Completed: 0.65600 secs
Strassen HBC - Pass 5 - Completed: 0.65600 secs
Notes:
Strassen HBI - Strassen's Heap Based Incomplete algorithm for matrix multiplication
Strassen HBC - Strassen's Heap Based Complete algorithm for matrix multiplication
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At me the positive effect of multisequencing of algorithms of fast matrix multiplication is shown on matrixes not less than 1500 * 1500: http://software.intel.com/ru-ru/forums/showthread.php?t=75835&o=a&s=lr
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At me the positive effect of multisequencing of algorithms of fast matrix multiplication is shown on matrixes not less than 1500 * 1500: http://software.intel.com/ru-ru/forums/showthread.php?t=75835&o=a&s=lr
Absolutely agree because modern CPUs arevery fast andit looks like useless to do anything else in case ofmultiplication of
small matrices. Thank you for the link and I'll take a look.
AStrassen HBC algorithm which I used for comparisonis a one thread algorithm designed and tuned upfor Embedded Real-Timesystems.
Best regards,
Sergey


- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page