Community
cancel
Showing results for 
Search instead for 
Did you mean: 
peryli
Beginner
111 Views

The AVX not effectives

my program for solving the lower triangular equations,i testthe serial version and the version with AVX, result is:

Matrix Order : 10000x4
Matrix Band : 5000x4
Data Layout : 4x4 columned
Data Type : double
CPU : Corei7 2600
memory size : 16G
platform : VS2010

serial version: 0.357240s ( 100 cycles averaged )

SMID(AVX) version : 0.338360s ( 100 cycles averaged )

my question is theversion with AVX nearly unavailable, why?

this my code:

[bash]inline void solveL_pivot( double* x, const double* L )

{

double e0=x[ 0 ];

double e1=x[ 1 ];

double e2=x[ 2 ];

double e3=x[ 3 ];

e0/=L[ 0 ]; e1-=e0*L[ 1 ]; e2-=e0*L[ 2 ]; e3-=e0*L[ 3 ];

e1/=L[ 5 ]; e2-=e1*L[ 6 ]; e3-=e1*L[ 7 ];

e2/=L[ 10 ]; e3-=e2*L[ 11 ];

e3/=L[ 15 ];

x[ 0 ]=e0;

x[ 1 ]=e1;

x[ 2 ]=e2;

x[ 3 ]=e3;

}



void vsolveL_band( double* o, const double* c, unsigned int order, unsigned int band ) { double* xproxy; double* Lproxy; __m256d ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7; double* x=o; double* L=( double* )c; unsigned int const stride=band<<4; unsigned int const d=order-band; unsigned int n=band-1; for( unsigned int i=0; id ){ --n; } xproxy=x; Lproxy=( double* )L; solveL_pivot( xproxy, Lproxy ); ymm0=_mm256_broadcast_sd( &xproxy[ 0 ] ); ymm1=_mm256_broadcast_sd( &xproxy[ 1 ] ); ymm2=_mm256_broadcast_sd( &xproxy[ 2 ] ); ymm3=_mm256_broadcast_sd( &xproxy[ 3 ] ); for( unsigned int k=0; kd ){ --n; } solveL_pivot( o, c ); solveL_update( o, c+16, n ); o+=4; c+=stride; } solveL_pivot( o, c ); } [/bash]
0 Kudos
2 Replies
111 Views

Hi,
Can you please confirm two things (I have nt looked the code yet):
1. Is it win7 64bit or 32bit
2. VS2010, is it SP1 or not. If it is not SP1, use SP1. VS2010 had AVX performance issues.
peryli
Beginner
111 Views

win7 64bit
VS2010+SP1

It achieved better speed-up for the decompose computing with AVX, or the math op not enough overlap the latency of data access in backsubstitution? I`m at a loss_ _
Reply