- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used compilation for Pentium M specific and added vectorization information from BuildLog to comments. Lets have a look at simple but interesting code:
module Types implicit none type PointerVector real, pointer :: Array(:) end type type AllocatableVector real, allocatable :: Array(:) end type interface operator(+) module procedure SumPointerVectors, SumAllocatableVectors end interface operator(+) interface operator(.plus.) module procedure SumPointerVectorsViaPointers end interface operator(.plus.) integer, parameter :: N = 1000000 contains function SumPointerVectors( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) ! loop was not vectorized: subscript too complex. result%Array = vector1%Array + vector2%Array end function SumPointerVectors function SumPointerVectorsViaPointers( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 real, pointer :: array(:), array1(:), array2(:) allocate( result%Array(Size( vector1%Array )) ) array => result%Array array1 => vector1%Array array2 => vector2%Array ! LOOP WAS VECTORIZED. ! LOOP WAS VECTORIZED. array = array1 + array2 end function SumPointerVectorsViaPointers function SumAllocatableVectors( vector1, vector2 ) result(result) type(AllocatableVector) result type(AllocatableVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) ! loop was not vectorized: subscript too complex. result%Array = vector1%Array + vector2%Array end function SumAllocatableVectors end module Types
Code:
program Alloc use Types implicit none type(PointerVector) Vector1Pointer, Vector2Pointer type(AllocatableVector) Vector1Allocatable, Vector2Allocatable allocate( Vector1Pointer%Array(N), Vector2Pointer%Array(N),& Vector1Allocatable%Array(N), Vector2Allocatable%Array(N) ) ! LOOP WAS VECTORIZED. Vector1Pointer%Array = 1 ! LOOP WAS VECTORIZED. Vector2Pointer%Array = 2 ! LOOP WAS VECTORIZED. Vector1Allocatable%Array = 3 ! LOOP WAS VECTORIZED. Vector2Allocatable%Array = 4 Vector1Pointer = Vector1Pointer + Vector2Pointer Vector1Pointer = Vector1Pointer .plus. Vector2Pointer ! loop was not vectorized: unsupported loop structure. Vector1Allocatable = Vector1Allocatable + Vector2Allocatable read * end program Alloc
result%Array = vector1%Array + vector2%Array
2. Why loop in SumPointerVectorsViaPointers
array = array1 + array2
was vectorized twice while intrinsic arrays would be vectorized only once?
3. To get vect orization I have to use additional pointers. But I cant do the same with allocatable components because components cant have target attribute. Thats why there is no SumAllocatableVectorsViaPointers procedure. Is it possible in Fortran 2003? I just would like to use allocatable components, but dont loose vectorization.
4. Is there another approach to get vectorization in this case?
5. Itanium doesnt have vectorization. Why so if its true? And what about vectorization in IA-64 in the future?
Any answers will be appreciated.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As a work around I suggest writing a subroutine that takes three arguments (the three arrays)and performs the sum. Then in your overloaded functions sepcify the subroutine call is to be inlined. In this manner the compiler optimizations will remove the code that performs the pointer assignments.
I think this will give you what you want until the compiler is fixed. I think this problem is a result of user defined types. I think the compiler assumes all user defined types will require a member offset and thus does not think it can vectorize the code. I've seen this in my code and when using the inlined subroutine call hack this seems to correct the vectorization problem in most cases.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler does not try to vectorize loops where the array access is complicated. It would require some special-case code to try to get some advantage from this. Deapite Jim's remark about "fix", there is no evidence of a compiler error here.
It is a fact that arrays that are components of derived types, especially in conjuction with pointer or allocatable, complicate life for the compiler and as such some optimization opportunities may be missed.
Itanium does not have "vectorization" in the traditional sense, but it achieves high performance using other architectural features that are exploited by the compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stanislaus,
Try the following and repor back on your success or failure:
Code:
module Types implicit none type PointerVector real, pointer :: Array(:) end type type AllocatableVector real, allocatable :: Array(:) end type interface operator(+) module procedure SumPointerVectors, SumAllocatableVectors end interface operator(+) interface operator(.plus.) module procedure SumPointerVectorsViaPointers end interface operator(.plus.) integer, parameter :: N = 1000000 contains subroutine SumArrays(A, B, C) real :: A(*), B(*), C(*) A = B + C end subroutine SumArrays function SumPointerVectors( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) !DEC$ ATTRIBUTES FORCEINLINE :: SumArrays call SumArrays(result%Array, vector1%Array, vector2%Array) end function SumPointerVectors function SumPointerVectorsViaPointers( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 real, pointer :: array(:), array1(:), array2(:) allocate( result%Array(Size( vector1%Array )) ) array => result%Array array1 => vector1%Array array2 => vector2%Array ! LOOP WAS VECTORIZED. ! LOOP WAS VECTORIZED. array = array1 + array2 end function SumPointerVectorsViaPointers function SumAllocatableVectors( vector1, vector2 ) result(result) type(AllocatableVector) result type(AllocatableVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) !DEC$ ATTRIBUTES FORCEINLINE :: SumArrays call SumArrays(result%Array, vector1%Array, vector2%Array) end function SumAllocatableVectors end module Types
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Stanislaus,
Please allow me to comment on the vectorization-related questions. The issues arise because compilers sometimes must make conservative assumptions that are not always directly obvious to humans (occasionally even including compiler engineers :-).
1. Why it [technically] impossible to vectorize
result%Array = vector1%Array + vector2%Array
This construct translatesinternally into memory references with components for which the compiler cannoteasily disprove loop invariance. This disambiguation problem impacts many optimizations, not just vectorization. Technically, runtime disambiguation tests could help to vectorize this code, as done for other cases, but here the testing overhead and code size increase was deemed prohibitive.
2. Why loop in SumPointerVectorsViaPointers
array = array1 + array2
was vectorized twice while intrinsic arrays would be vectorized only once?
Again, the compiler fails to disambiguate the memory references and translates the single implied loop into a setup loop and computation loop to preserve F90 semantics. The vectorizer simply vectorizes both loops.
5. Itanium doesnt have vectorization. Why so if its true? And what about vectorization in IA-64 in the future?
As Steve said, other techniques, like software pipelining, are more important for IA64. Nevertheless, we plan to incorporate some of the vectorization techniques for IA32 and EM64T into the IA64 compilers as well, but this will only deal with integer operations, not floating-point.
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 02-08-2006 10:30 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for answers, Jim, Steve, Aart (last post I've just seen, will answer some later).
FORCEINLINE doesn't work: "unresolved _SumArrays"
In SumArrays I have to use : instead of * -- compilers should know upper bound.
INLINE doesn't work.. but I tried to use /Qipo.
Below code is the same but comments are different.
New code without /Qipo:
Code:
module Types implicit none type PointerVector real, pointer :: Array(:) end type type AllocatableVector real, allocatable :: Array(:) end type interface operator(.viaSumArrays.) module procedure SumPointerVectorsAdditionalSumProcedure,& SumAllocatableVectorsAdditionalSumProcedure end interface operator(.viaSumArrays.) interface operator(.viaPointers.) module procedure SumPointerVectorsViaPointers end interface operator(.viaPointers.) interface operator(+) module procedure SimpleSumPointerVectors, SimpleSumAllocatableVectors end interface operator(+) integer, parameter :: N = 1000000 contains subroutine SumArrays( A, B, C ) real :: A(:), B(:), C(:) ! LOOPS WAS VECTORIZED. A = B + C end subroutine SumArrays function SumPointerVectorsAdditionalSumProcedure( vector1, vector2 )& result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) call SumArrays( result%Array, vector1%Array, vector2%Array ) end function SumPointerVectorsAdditionalSumProcedure function SumAllocatableVectorsAdditionalSumProcedure( vector1, vector2 )& result(result) type(AllocatableVector) result type(AllocatableVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) call SumArrays( result%Array, vector1%Array, vector2%Array ) end function SumAllocatableVectorsAdditionalSumProcedure function SumPointerVectorsViaPointers( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 real, pointer :: array(:), array1(:), array2(:) allocate( result%Array(Size( vector1%Array )) ) array => result%Array array1 => vector1%Array array2 => vector2%Array ! LOOPS WAS VECTORIZED. ! LOOPS WAS VECTORIZED. array = array1 + array2 end function SumPointerVectorsViaPointers function SimpleSumPointerVectors( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) ! loop was not vectorized: unsupported loop structure. result%Array = vector1%Array + vector2%Array end function SimpleSumPointerVectors function SimpleSumAllocatableVectors( vector1, vector2 ) result(result) type(AllocatableVector) result type(AllocatableVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) ! loop was not vectorized: unsuppor ted loop structure. result%Array = vector1%Array + vector2%Array end function SimpleSumAllocatableVectors end module Types
Code:
program Alloc use Types implicit none type(PointerVector) Vector1Pointer, Vector2Pointer type(AllocatableVector) Vector1Allocatable, Vector2Allocatable allocate( Vector1Pointer%Array(N), Vector2Pointer%Array(N),& Vector1Allocatable%Array(N), Vector2Allocatable%Array(N) ) ! LOOPS WAS VECTORIZED. Vector1Pointer%Array = 1 Vector2Pointer%Array = 2 Vector1Allocatable%Array = 3 Vector2Allocatable%Array = 4 Vector1Pointer = Vector1Pointer .viaSumArrays. Vector2Pointer Vector1Pointer = Vector1Pointer .viaPointers. Vector2Pointer Vector1Pointer = Vector1Pointer + Vector2Pointer ! loop was not vectorized: unsupported loop structure. Vector1Allocatable = Vector1Allocatable .viaSumArrays. Vector2Allocatable ! loop was not vectorized: unsupported loop structure. Vector1Allocatable = Vector1Allocatable + Vector2Allocatable read * end program Alloc
And with /Qipo:
Code:
module Types implicit none type PointerVector real, pointer :: Array(:) end type type AllocatableVector real, allocatable :: Array(:) end type interface operator(.viaSumArrays.) module procedure SumPointerVectorsAdditionalSumProcedure,& SumAllocatableVectorsAdditionalSumProcedure end interface operator(.viaSumArrays.) interface operator(.viaPointers.) module procedure SumPointerVectorsViaPointers end interface operator(.viaPointers.) interface operator(+) module procedure SimpleSumPointerVectors, SimpleSumAllocatableVectors end interface operator(+) integer, parameter :: N = 1000000 contains subroutine SumArrays( A, B, C ) real :: A(:), B(:), C(:) A = B + C end subroutine SumArrays function SumPointerVectorsAdditionalSumProcedure( vector1, vector2 )& result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) ! LOOPS WAS VECTORIZED. call SumArrays( result%Array, vector1%Array, vector2%Array ) end function SumPointerVectorsAdditionalSumProcedure function SumAllocatableVectorsAdditionalSumProcedure( vector1, vector2 )& result(result) type(AllocatableVector) result type(AllocatableVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) ! LOOPS WAS VECTORIZED. call SumArrays( result%Array, vector1%Array, vector2%Array ) end function SumAllocatableVectorsAdditionalSumProcedure function SumPointerVectorsViaPointers( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 real, pointer :: array(:), array1(:), array2(:) allocate( result%Array(Size( vector1%Array )) ) array => result%Array array1 => vector1%Array array2 => vector2%Array array = array1 + array2 end function SumPointerVectorsViaPointers function SimpleSumPointerVectors( vector1, vector2 ) result(result) type(PointerVector) result type(PointerVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) result%Array = vector1%Array + vector2%Array end function SimpleSumPointerVectors function SimpleSumAllocatableVectors( vector1, vector2 ) result(result) type(AllocatableVector) result type(AllocatableVector), intent(in) :: vector1, vector2 allocate( result%Array(Size( vector1%Array )) ) result%Array = vector1%Array + vector2%Array end function SimpleSumAllocatableVectors end module Types
Code:
program Alloc use Types implicit none type(PointerVector) Vector1Pointer, Vector2Pointer type(AllocatableVector) Vector1Allocatable, Vector2Allocatable allocate( Vector1Pointer%Array(N), Vector2Pointer%Array(N),& Vector1Allocatable%Array(N), Vector2Allocatable%Array(N) ) ! LOOPS WAS VECTORIZED. Vector1Pointer%Array = 1 Vector2Pointer%Array = 2 Vector1Allocatable%Array = 3 Vector2Allocatable%Array = 4 Vector1Pointer = Vector1Pointer .viaSumArrays. Vector2Pointer ! LOOPS WAS VECTORIZED. ! LOOPS WAS VECTORIZED. Vector1Pointer = Vector1Pointer .viaPointers. Vector2Pointer ! LOOPS WAS VECTORIZED. Vector1Pointer = Vector1Pointer + Vector2Pointer ! loop was not vectorized: unsupported loop structure. Vector1Allocatable = Vector1Allocatable .viaSumArrays. Vector2Allocatable ! LOOPS WAS VECTORIZED. ! loop was not vectorized: unsupported loop structure. Vector1Allocatable = Vector1Allocatable + Vector2Allocatable read * end program Alloc
So, what we have:
Code:
No /Qipo Pointer: viaSumArrays: LOOPS WAS VECTORIZED. but only once. viaPointers: LOOPS WAS VECTORIZED. LOOPS WAS VECTORIZED. +: loop was not vectorized: unsupported loop structure. Allocatable: viaSumArrays: LOOPS WAS VECTORIZED. but only once. loop was not vectorized: unsupported loop structure. (in program) ??? +: loop was not vectorized: unsupported loop structure. loop was not vectorized: unsupported loop structure. (in program) /Qipo Pointer: viaSumArrays: LOOPS WAS VECTORIZED. (in program) but only once. viaPointers: LOOPS WAS VECTORIZED. (in program) LOOPS WAS VECTORIZED. (in program) +: LOOPS WAS VECTORIZED. (in program) !!! but only once. Allocatable: viaSumArrays: LOOPS WAS VECTORIZED. but only once. loop was not vectorized: unsupported loop structure. (in program) ??? +: LOOPS WAS VECTORIZED. (in program) !!! but only once. loop was not vectorized: (in program) unsupported loop structure. ???
Interesting, isn't it?
SumArrays works always but vectorization is only once.
ViaPointers (for pointers only) works always, vectorization is twice.
+ hav e vectorization only with /Qipo, in program.
Allocatables always have "loop was not vectorized: unsupported loop structure" even when theywere vectorized (instead when SumArrays is used). Why so? (pointers are clear)
/Qipo gives onevectorization (in each case). So, compiler can do this if it wants. :) Looks like /Qipo can do vectorization (butonce). Why /Qipo gives vectorization in program instead SumArrays? Will /Qipo always save us? I mean, give vectorization.
Why viaPointers have two vectorizations? I think assignment (for viaPointers)takes additional vectorization. No?
But what I should do with Thirteen Diagonal Simetric Matrix? If I will use Sum/Product/SubtractArrays program would become very hard readable. But if I will use viaPointers (13 aditional pointers) code would become clearer + twice vectorization for each forall-string. Example without anything.
Code:
module TDSMatrixVector implicit none integer, parameter :: N = 100000 type Vector integer :: Length = 0 real, pointer :: Array(:) end type Vector ! TDSMatrix -- Thirteen Diagonal Simetric Matrix. type TDSMatrixDimension integer :: Size = 0 integer :: Sec = 0 end type TDSMatrixDimension type TDSMatrix type(TDSMatrixDimension) :: Dim real, pointer :: alfa(:) => Null() real, pointer :: beta(:) => Null() real, pointer :: gamma(:) => Null() real, pointer :: delta(:) => Null() real, pointer :: epsilon(:) => Null() real, pointer :: dzeta(:) => Null() real, pointer :: eta(:) => Null() end type TDSMatrix interface operator(*) module procedure ProductOfTDSMatrixVector end interface operator(*) contains function ProductOfTDSMatrixVector( TDSMat, vect ) result(result) type(Vector) result type(TDSMatrix), intent(in) :: TDSMat type(Vector), intent(in) :: vect integer i if( vect%Length > 0 ) then result%Length = vect%Length allocate(result%Array(result%Length)) forall( i = 1:vect%Length )& result%Array(i) = TDSMat%alfa(i) * vect%Array(i) forall( i = 1:vect%Length - 1 )& result%Array(i) = result%Array(i) + TDSMat%beta(i)*vect%Array(i+1) forall( i = 2:vect%Length )& result%Array(i) = result%Array(i) +& TDSMat%beta(i-1)*vect%Array(i-1) forall( i = 1:vect%Length - 2 )& result%Array(i) = result%Array(i) +& TDSMat%gamma(i)*vect%Array(i+2) forall( i = 3:vect%Length )& result%Array(i) = result%Array(i) +& TDSMat%gamma(i-2)*vect%Array(i-2) forall( i = 1:TDSMat%Dim%Sec + 1 )& result%Array(i) = result%Array(i) +& TDSMat%delta(i)*vect%Array(i+TDSMat%Dim%Size-TDSMat%Dim%Sec-1) forall( i = TDSMat%Dim%Size - TDSMat%Dim%Sec:vect%Length )& result%Array(i) = result%Array(i) +& TDSMat%delta(i-TDSMat%Dim%Size+TDSMat%Dim%Sec+1)*& vect%Array(i-TDSMat%Dim%Size+TDSMat%Dim%Sec+1) forall( i = 1:TDSMat%Dim%Sec )& result%Array(i) = result%Array(i) +& TDSMat%epsilon(i)*vect%Array(i+TDSMat%Dim%Size-TDSMat%Dim%Sec) forall( i = TDSMat%Dim%Size - TDSMat%Dim%Sec + 1:vect%Length )& result%Array(i) = result%Array(i) +& TDSMat%epsilon(i-TDSMat%Dim%Size+TDSMat%Dim%Sec)*& vect%Array (i-TDSMat%Dim%Size+TDSMat%Dim%Sec) forall( i = 1:TDSMat%Dim%Sec - 1 )& result%Array(i) = result%Array(i) +& TDSMat%dzeta(i)*vect%Array(i+TDSMat%Dim%Size-TDSMat%Dim%Sec+1) forall( i = TDSMat%Dim%Size - TDSMat%Dim%Sec + 2:vect%Length )& result%Array(i) = result%Array(i) +& TDSMat%dzeta(i-TDSMat%Dim%Size+TDSMat%Dim%Sec-1)*& vect%Array(i-TDSMat%Dim%Size+TDSMat%Dim%Sec-1) forall( i = 1:2*TDSMat%Dim%Sec - TDSMat%Dim%Size )& result%Array(i) = result%Array(i) +& TDSMat%eta(i)*vect%Array(i+2*(TDSMat%Dim%Size-TDSMat%Dim%Sec)) forall( i = 1 + 2*(TDSMat%Dim%Size - TDSMat%Dim%Sec):vect%Length )& result%Array(i) = result%Array(i) +& TDSMat%eta(i-2*(TDSMat%Dim%Size-TDSMat%Dim%Sec))*& vect%Array(i-2*(TDSMat%Dim%Size-TDSMat%Dim%Sec)) end if end function ProductOfTDSMatrixVector end module TDSMatrixVector
Code:
program ComplexAlloc use TDSMatrixVector implicit none type(TDSMatrix) TDS type(Vector) vect allocate( vect%Array(N), TDS%alfa(N), TDS%beta(N), TDS%gamma(N), TDS%delta(N),& TDS%epsilon(N), TDS%dzeta(N), TDS%eta(N) ) vect = TDS * vect end program ComplexAlloc
So, what use:
1. Sum/Product/Subtract -- hard read, one vectorization, always.
2. /Qipo -- nothing to do,one vectorization,there is no surance. For TDS-example it doesn't works, for example.
3. Via aditional pointers -- aditional variables, two vectorizations, always.
4. Leave as is and live happy without vectorization :(.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for your explanations, Aart.
1. Why it [technically] impossible to vectorize
result%Array = vector1%Array + vector2%Array
This construct translatesinternally into memory references with components for which the compiler cannoteasily disprove loop invariance. This disambiguation problem impacts many optimizations, not just vectorization. Technically, runtime disambiguation tests could help to vectorize this code, as done for other cases, but here the testing overhead and code size increase was deemed prohibitive.
But as we can see in prevuous my post /Qipo allows do that (see SimplePlus-Procedures for (+) in /Qipo mode).
Why compiler does work with components but not with pointers as they are (like they would be alone)? I mean, compiler can take from, sayvector1, address of Array component and work with it like it isn't the component.
2. Why loop in SumPointerVectorsViaPointers
array = array1 + array2
was vectorized twice while intrinsic arrays would be vectorized only once?
Again, the compiler fails to disambiguate the memory references and translates the single implied loop into a setup loop and computation loop to preserve F90 semantics. The vectorizer simply vectorizes both loops.
So, it would work longer than while it would be one vectorization (for intrinsic arrays)?
5. Itanium doesnt have vectorization. Why so if its true? And what about vectorization in IA-64 in the future?
As Steve said, other techniques, like software pipelining, are more important for IA64. Nevertheless, we plan to incorporate some of the vectorization techniques for IA32 and EM64T into the IA64 compilers as well, but this will only deal with integer operations, not floating-point.
Should I write vectorization-oriented programs anyway (usually I have access to processors with SSEx)? But in future all my work wiil become unnecessary (IA64 would use vectorization).
This example unsettled me a little bit.
Code:
program Vectorization implicit none integer, parameter :: N = 100000 real, pointer :: VectorP1(:), VectorP2(:) real, allocatable :: VectorA1(:), VectorA2(:) real Vector1(N), Vector2(N) allocate( VectorP1(N), VectorP2(N), VectorA1(N), VectorA2(N) ) ! LOOP WAS VECTORIZED. ! LOOP WAS VECTORIZED. VectorP1 = VectorP1 + VectorP2 ! loop was not vectorized: existence of vector dependence. VectorA1 = VectorA1 + VectorA2 ! LOOP WAS VECTORIZED. Vector1 = Vector1 + Vector2 end program Vectorization
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Stanislaus,
Failure to vectorize in most cases you encountered is due to the use of dope vectors where pretty much all components are symbolic (base, lowerbound, stride, etc.). When more context is visible (like under Qipo), constants can be propagated into the symbolic components, which enables better analysis and, hence, optimization. Directives, like the one below, can often workaround issues like this.
!DIR$ ivdep
VectorA1 = VectorA1 + VectorA2
Please be assured that I completely understand how frustrating this is and how our customers sometimes would like to live without vectorization. Nevertheless, I continue my attempts to improve vectorization and even encountered a few cases in your examples that at first glance- seem open for improvement. Give me some time to ponder over those.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Stanislaus,
Is the object of the coding exercize to make the code look neet and clean or to make the code lean and mean (fast)?
You are creating an overloaded operator functions. The code using the overloaded operators will never see the hacks needed to impliment the lean (fast) functions. Often it is a good practice to use contitional compilation such that the two coding method appear adjacent and selected via a compile time variable. You can enable/disable during testing and then use the fast code for production.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Of course I need a fast code. But in this case I can write all on ASM, can't I? I have a lot of very serious code with a lot of operands in one string and so on. If I will write hard readable code I will make a lot of mistakes wich would require much time for detecting them. Time for developing can take very much time so I will loose, not won. It doesn't mean that I wouldn't use primitive procedures. Actually for simple operations they quite good. May be they should be used also for fully working code after all chekings and debuggings. I didn't decided yet but I like your aproach because using pointers not idealtoo and requires two vectorizations. Yak. I do not know. That's why I have created this topic.
What about operations -- I do not use them actually. For this case I have procedures like ProductTDSMatrixVector( result, TDSMatrix, vector ), Soit doesn'tcreate temporary memory. It would be better, I tnink, to have operations like += -= *= /=. For this kind of operations we would declare procedures withh three operands like ProductTDSMatrixVector. But last operation may be explain why we do not have them...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page