- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am a new BLAS user, trying to improve c code for solving a time dependent 2D wave equation (PML absorbing boundaries) by replacing some of my loops with cBLAS functions. Just to get a feel, I started by concentrating on one code block within the program, the block for updating p below.
struct grid {
double dt;
int ny;
int nx;
and other grid info ....
};
struct pmlwaves{
double *p;
double *pdot;
otherpointers to double neededfor the PML absorbing boundary method...
};
main(){
struct pmlwaves w;
struct grid g;
int k;
different initializations....
for(tstep=0; tstep
different calculations to find pdot...
/*option 1:naive code */
for(k=0; k
/* option 2: cBLAS */
cblas_daxpy(g.ny*g.nx, g.dt, w.pdot,1,w.p,1); /* advance solution one time step */
different cacluations arising from PML absorbing boundary method....
} /*end time stepping loop */
} /* end main() */
This was compiled with icc on a multi-core (Xeon 7550) machine, the exact compilation command was
icc myfile.c -O2 -mkl -openmp
The values used were g.nx = g.ny = 481, num_tsteps = 7500.
I used the openmp function omp_get_wtime() to measure the wall clock execution time of this code block and to accumulate these times throughout the time stepping loop.
The following (surprising?) results were obtained:
1) Option 1 accumulated time: around 3 sec.
2) Option 2 accumulated time: around 90 sec!!
3) When w.p and w.pdot were replaced with "regular" pointers (which are not fields of a structure) the time
of option 2 was around 3 sec (just like the naive loop in option 1).
My questions:
1) Why the incredibly long running time when the pointer arguments to cblas_daxpy were fields in a structure? They just pass an address to daxpy dont they? Why doesent cblas_daxpy regard w.p and w.pdot just as pointers to double (as they are)?
2) When comparing items 1 and 3 in the results, why doesent cblas_daxpy offer any advantage over the naive loop? Did I ommitt any flags/options to the compiler?
3) There seems to be a whole lot of things to know about proper running of BLAS/cBLAS, mainly about compiler options/flags and makefile issues and their compatibility to a specific machine. Where can all this be learned? Is there some good resource/website/book?
Thanks a lot
Jake
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suppose you'd have to analyze the actual example to try to find out what optimizations are missed with the struct of pointers. As far as I know, it's not a common programming model which would appear in a corpus of code for which the compiler is performance tested. You could experiment by making an explicit local plain pointer copy just ahead of the loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jake,
As you mentioned, one would expect that using pointers or pointers in structs would not make a dramatic performance difference. I verified this using your code. For me, there was no performance difference between pointers and pointers in structs. One thing to pay attantion here is how the pointers are allocated/initialized. Proper allignment of the pointer addresses usually improves the performance. You could try using mkl_malloc for allocating alligned memory.
MKL daxpy binaries are compiled with the optimal compiler flags. Therefore, the compiler flags you use in your program should not make a dramatic difference on the MKL daxpy performance.
Intel Math Kernel Library for Linux* OS Users Guide provides guidelines for using MKL and summarizes factors that affect performance.
Thanks,
Efe
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page