- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
my program for solving the lower triangular equations,i testthe serial version and the version with AVX, result is:
Matrix Order : 10000x4
Matrix Band : 5000x4
Data Layout : 4x4 columned
Data Type : double
CPU : Corei7 2600
memory size : 16G
platform : VS2010
serial version: 0.357240s ( 100 cycles averaged )
SMID(AVX) version : 0.338360s ( 100 cycles averaged )
my question is theversion with AVX nearly unavailable, why?
this my code:
[bash]inline void solveL_pivot( double* x, const double* L ) { double e0=x[ 0 ]; double e1=x[ 1 ]; double e2=x[ 2 ]; double e3=x[ 3 ]; e0/=L[ 0 ]; e1-=e0*L[ 1 ]; e2-=e0*L[ 2 ]; e3-=e0*L[ 3 ]; e1/=L[ 5 ]; e2-=e1*L[ 6 ]; e3-=e1*L[ 7 ]; e2/=L[ 10 ]; e3-=e2*L[ 11 ]; e3/=L[ 15 ]; x[ 0 ]=e0; x[ 1 ]=e1; x[ 2 ]=e2; x[ 3 ]=e3; }
void vsolveL_band( double* o, const double* c, unsigned int order, unsigned int band ) { double* xproxy; double* Lproxy; __m256d ymm0, ymm1, ymm2, ymm3, ymm4, ymm5, ymm6, ymm7; double* x=o; double* L=( double* )c; unsigned int const stride=band<<4; unsigned int const d=order-band; unsigned int n=band-1; for( unsigned int i=0; id ){ --n; } xproxy=x; Lproxy=( double* )L; solveL_pivot( xproxy, Lproxy ); ymm0=_mm256_broadcast_sd( &xproxy[ 0 ] ); ymm1=_mm256_broadcast_sd( &xproxy[ 1 ] ); ymm2=_mm256_broadcast_sd( &xproxy[ 2 ] ); ymm3=_mm256_broadcast_sd( &xproxy[ 3 ] ); for( unsigned int k=0; k d ){ --n; } solveL_pivot( o, c ); solveL_update( o, c+16, n ); o+=4; c+=stride; } solveL_pivot( o, c ); } [/bash]
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you please confirm two things (I have nt looked the code yet):
1. Is it win7 64bit or 32bit
2. VS2010, is it SP1 or not. If it is not SP1, use SP1. VS2010 had AVX performance issues.
Can you please confirm two things (I have nt looked the code yet):
1. Is it win7 64bit or 32bit
2. VS2010, is it SP1 or not. If it is not SP1, use SP1. VS2010 had AVX performance issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
win7 64bit
VS2010+SP1
It achieved better speed-up for the decompose computing with AVX, or the math op not enough overlap the latency of data access in backsubstitution? I`m at a loss_ _
VS2010+SP1
It achieved better speed-up for the decompose computing with AVX, or the math op not enough overlap the latency of data access in backsubstitution? I`m at a loss_ _
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page