- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have been using the dgetri and dgetrf functions on my machine, but the perforamnce that I have been getting on my random matracies has been extremely poor (5 g/flops running on all 4 cores)
My question is: Is there something that I have setup wrong if I resort to default setting on my ifort and icc setups or perhaps calling it incorectly?
Specs:
Intel Q6600 Core 2 Quad
4 GB DDR2 RAM
Ubuntu 9.04 x86_64
Code (C):
for(x=0;x {
//Variables Needed to be reset for
M=j;
N=j;
LDA=M;
LWORK=N;
INFO=0;
createMatrix(&M, &N, &A);
IPIV=(MKL_INT *)malloc(M*sizeof(int));
WORK=(double *)malloc(M*sizeof(double));
DGETRF( &M, &N, A, &LDA, IPIV, &INFO );
gettimeofday(&time_s, NULL);
DGETRI( &N, A, &LDA, IPIV, WORK, &LWORK, &INFO );
gettimeofday(&time_e, NULL);
cpuTime=0;
CPU_gflops=0;
temp=0;
cpuTime=1e3*(time_e.tv_sec -time_s.tv_sec) + (time_e.tv_usec
-time_s.tv_usec)*1e-3;
//Found in lawn41 lapack manual for greatest term in O(n) notation, p121
temp = (1.0f*M*N*N);//O(2mn^2)
CPU_gflops = (temp/cpuTime) * 1e-6;
avg_flops=CPU_gflops;
free(A);
free(IPIV);
free(WORK);
}
Makefile:
FC = ifort
CC = icc
FCFLAGS = -O3 -cm -w
CCFLAGS = -O3
CXXDIR = /opt/intel/Compiler/11.1/038
LIBDIR:= $(CXXDIR)/mkl/lib/em64t
LIBS:= $(LIBDIR)/libmkl_intel_lp64.a
LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread
OBJECTS = makematrix.o \
MatrixMath.o
DGETRI : $(OBJECTS) DGETRIDriver.o
$(CC) -o $@ $(OBJECTS) DGETRIDriver.o -L$(LIBDIR) $(LIBS)
This is my first time working with MKL, so any help is appreciated, Thanks!
Matt
My question is: Is there something that I have setup wrong if I resort to default setting on my ifort and icc setups or perhaps calling it incorectly?
Specs:
Intel Q6600 Core 2 Quad
4 GB DDR2 RAM
Ubuntu 9.04 x86_64
Code (C):
for(x=0;x
//Variables Needed to be reset for
M=j;
N=j;
LDA=M;
LWORK=N;
INFO=0;
createMatrix(&M, &N, &A);
IPIV=(MKL_INT *)malloc(M*sizeof(int));
WORK=(double *)malloc(M*sizeof(double));
DGETRF( &M, &N, A, &LDA, IPIV, &INFO );
gettimeofday(&time_s, NULL);
DGETRI( &N, A, &LDA, IPIV, WORK, &LWORK, &INFO );
gettimeofday(&time_e, NULL);
cpuTime=0;
CPU_gflops=0;
temp=0;
cpuTime=1e3*(time_e.tv_sec -time_s.tv_sec) + (time_e.tv_usec
-time_s.tv_usec)*1e-3;
//Found in lawn41 lapack manual for greatest term in O(n) notation, p121
temp = (1.0f*M*N*N);//O(2mn^2)
CPU_gflops = (temp/cpuTime) * 1e-6;
avg_flops
free(A);
free(IPIV);
free(WORK);
}
Makefile:
FC = ifort
CC = icc
FCFLAGS = -O3 -cm -w
CCFLAGS = -O3
CXXDIR = /opt/intel/Compiler/11.1/038
LIBDIR:= $(CXXDIR)/mkl/lib/em64t
LIBS:= $(LIBDIR)/libmkl_intel_lp64.a
LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread
OBJECTS = makematrix.o \
MatrixMath.o
DGETRI : $(OBJECTS) DGETRIDriver.o
$(CC) -o $@ $(OBJECTS) DGETRIDriver.o -L$(LIBDIR) $(LIBS)
This is my first time working with MKL, so any help is appreciated, Thanks!
Matt
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Matt,
There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:
int MONE=-1;
double LWKOPT;
DGETRI( &N, A, &LDA, IPIV, &LWKOPT, &MONE, &INFO );
LWORK=(int)LWKOPT;
Please also point attention that in your example instead of
WORK=(double *)malloc(M*sizeof(double));
should be:
WORK=(double *)malloc(LWORK*sizeof(double));
--Alexander
There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:
int MONE=-1;
double LWKOPT;
DGETRI( &N, A, &LDA, IPIV, &LWKOPT, &MONE, &INFO );
LWORK=(int)LWKOPT;
Please also point attention that in your example instead of
WORK=(double *)malloc(M*sizeof(double));
should be:
WORK=(double *)malloc(LWORK*sizeof(double));
--Alexander
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - hpc-matt
I have been using the dgetri and dgetrf functions on my machine, but the perforamnce that I have been getting on my random matracies has been extremely poor (5 g/flops running on all 4 cores)
My question is: Is there something that I have setup wrong if I resort to default setting on my ifort and icc setups or perhaps calling it incorectly?
Specs:
Intel Q6600 Core 2 Quad
4 GB DDR2 RAM
Ubuntu 9.04 x86_64
Code (C):
for(x=0;x {
//Variables Needed to be reset for
M=j;
N=j;
LDA=M;
LWORK=N;
INFO=0;
createMatrix(&M, &N, &A);
IPIV=(MKL_INT *)malloc(M*sizeof(int));
WORK=(double *)malloc(M*sizeof(double));
DGETRF( &M, &N, A, &LDA, IPIV, &INFO );
gettimeofday(&time_s, NULL);
DGETRI( &N, A, &LDA, IPIV, WORK, &LWORK, &INFO );
gettimeofday(&time_e, NULL);
cpuTime=0;
CPU_gflops=0;
temp=0;
cpuTime=1e3*(time_e.tv_sec -time_s.tv_sec) + (time_e.tv_usec
-time_s.tv_usec)*1e-3;
//Found in lawn41 lapack manual for greatest term in O(n) notation, p121
temp = (1.0f*M*N*N);//O(2mn^2)
CPU_gflops = (temp/cpuTime) * 1e-6;
avg_flops=CPU_gflops;
free(A);
free(IPIV);
free(WORK);
}
Makefile:
FC = ifort
CC = icc
FCFLAGS = -O3 -cm -w
CCFLAGS = -O3
CXXDIR = /opt/intel/Compiler/11.1/038
LIBDIR:= $(CXXDIR)/mkl/lib/em64t
LIBS:= $(LIBDIR)/libmkl_intel_lp64.a
LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread
OBJECTS = makematrix.o
MatrixMath.o
DGETRI : $(OBJECTS) DGETRIDriver.o
$(CC) -o $@ $(OBJECTS) DGETRIDriver.o -L$(LIBDIR) $(LIBS)
This is my first time working with MKL, so any help is appreciated, Thanks!
Matt
My question is: Is there something that I have setup wrong if I resort to default setting on my ifort and icc setups or perhaps calling it incorectly?
Specs:
Intel Q6600 Core 2 Quad
4 GB DDR2 RAM
Ubuntu 9.04 x86_64
Code (C):
for(x=0;x
//Variables Needed to be reset for
M=j;
N=j;
LDA=M;
LWORK=N;
INFO=0;
createMatrix(&M, &N, &A);
IPIV=(MKL_INT *)malloc(M*sizeof(int));
WORK=(double *)malloc(M*sizeof(double));
DGETRF( &M, &N, A, &LDA, IPIV, &INFO );
gettimeofday(&time_s, NULL);
DGETRI( &N, A, &LDA, IPIV, WORK, &LWORK, &INFO );
gettimeofday(&time_e, NULL);
cpuTime=0;
CPU_gflops=0;
temp=0;
cpuTime=1e3*(time_e.tv_sec -time_s.tv_sec) + (time_e.tv_usec
-time_s.tv_usec)*1e-3;
//Found in lawn41 lapack manual for greatest term in O(n) notation, p121
temp = (1.0f*M*N*N);//O(2mn^2)
CPU_gflops = (temp/cpuTime) * 1e-6;
avg_flops
free(A);
free(IPIV);
free(WORK);
}
Makefile:
FC = ifort
CC = icc
FCFLAGS = -O3 -cm -w
CCFLAGS = -O3
CXXDIR = /opt/intel/Compiler/11.1/038
LIBDIR:= $(CXXDIR)/mkl/lib/em64t
LIBS:= $(LIBDIR)/libmkl_intel_lp64.a
LIBS += -Wl,--start-group -L$(LIBDIR) $(LIBDIR)/libmkl_intel_thread.a $(LIBDIR)/libmkl_core.a -Wl,--end-group -L$(LIBDIR) -liomp5 -lpthread
OBJECTS = makematrix.o
MatrixMath.o
DGETRI : $(OBJECTS) DGETRIDriver.o
$(CC) -o $@ $(OBJECTS) DGETRIDriver.o -L$(LIBDIR) $(LIBS)
This is my first time working with MKL, so any help is appreciated, Thanks!
Matt
Matt,
it will depends on the size of task you are running on these 4 cores.
Intel Math Kernel Library (Intel MKL) offers highly optimized routines for middle and large input sizes.
For you reference, please see
http://software.intel.com/sites/products/collateral/hpc/mkl/mkl_indepth.pdf
you can find there some performance data for dgetrf of MKL vs Atlas.
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Gennady Fedorov (Intel)
Matt,
it will depends on the size of task you are running on these 4 cores.
Intel Math Kernel Library (Intel MKL) offers highly optimized routines for middle and large input sizes.
For you reference, please see
http://software.intel.com/sites/products/collateral/hpc/mkl/mkl_indepth.pdf
you can find there some performance data for dgetrf of MKL vs Atlas.
--Gennady
I am using matrcies of dimension 2k ~12k. I have been benchmarking my machine, and the dgetrf routine is about he same as the standard benchamrks, however the DGETRI funciton is underperforming substatially. I realize the runtime complexity is on the order of O(n*m^2), but still, if i can get 30+ g/Flops for dgetrf, I should be able ot get half of that using the dgetri. I am currently getting around 3gflops, with decreasing performance as size increases. It also does not matter if I am using fortran or C. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In case correcting your assignment of lwork doesn't help:
It looks as if you are hitting cache capacity limit. Did you check cache events? It may be interesting, once you find which function is taking up time, to compile that one from source so as to analyze by VTune or PTU.
It looks as if you are hitting cache capacity limit. Did you check cache events? It may be interesting, once you find which function is taking up time, to compile that one from source so as to analyze by VTune or PTU.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Matt,
There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:
int MONE=-1;
double LWKOPT;
DGETRI( &N, A, &LDA, IPIV, &LWKOPT, &MONE, &INFO );
LWORK=(int)LWKOPT;
Please also point attention that in your example instead of
WORK=(double *)malloc(M*sizeof(double));
should be:
WORK=(double *)malloc(LWORK*sizeof(double));
--Alexander
There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:
int MONE=-1;
double LWKOPT;
DGETRI( &N, A, &LDA, IPIV, &LWKOPT, &MONE, &INFO );
LWORK=(int)LWKOPT;
Please also point attention that in your example instead of
WORK=(double *)malloc(M*sizeof(double));
should be:
WORK=(double *)malloc(LWORK*sizeof(double));
--Alexander
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Alexander Kobotov (Intel)
Matt,
There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:
int MONE=-1;
double LWKOPT;
DGETRI( &N, A, &LDA, IPIV, &LWKOPT, &MONE, &INFO );
LWORK=(int)LWKOPT;
Please also point attention that in your example instead of
WORK=(double *)malloc(M*sizeof(double));
should be:
WORK=(double *)malloc(LWORK*sizeof(double));
--Alexander
There is not enought workspaceyou allocated for DGETRIto achieve high performance. You should use LWORK=N*NB, where particullary NB=64. You could also request optimalworkspace size from the DGETRI itself:
int MONE=-1;
double LWKOPT;
DGETRI( &N, A, &LDA, IPIV, &LWKOPT, &MONE, &INFO );
LWORK=(int)LWKOPT;
Please also point attention that in your example instead of
WORK=(double *)malloc(M*sizeof(double));
should be:
WORK=(double *)malloc(LWORK*sizeof(double));
--Alexander
Matt
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page