- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
subroutine foo(out,inA,inB, count)
cDEC$ attributes align: 16 :: out
cDEC$ attributes align: 16 :: inA
cDEC$ attributes align: 16 :: inB
integer :: count
real(8) :: out(count),inA(count),inB(count)
_asm { ! compiler inserts fast code to test for alignment
mov eax,dword ptr out
or eax,dword ptr inA
or eax,dword ptr inB
and eax,0Fh
jneReportError
// fall through to execute section
}
do i=1,count
out(i) = inA(i) + inB(i) // will process two at a time
end do
end subroutine foo
_asm {
ReportError:
call ArgAllignmentError(ModuleName)
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jim,
Hopefully you will be happy to hear that we are working on a Fortran equivalent of the Intel C/C++ __assume_aligned() hint. However, the compiler will not generate code that tests if the asserted property is actually true (since the whole purpose of such hints is to avoid any runtime overhead). As for any assertion, the compiler will simply optimize the code accordingly. If you give a wrong assertion, the code may break (cause a runtime exception in this case).
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
assum_aligned (or what you work up for it) isfair. The user always has the option of:
#ifdef _DEBUG
if(mod(loc(arg1),16) .ne. 0) call ('Fool')
#endif
Or leave the test in the Release version at the programmer's choice.
The programmer can always look at the CallStack.
My suggestion was to catch the error _prior_ to debugging. As we both know, test cases sometimes pass debugging where end-user use of application will fail. Having the compiler assert the alignedness would verify proper calling sequence at compile time. You currently perform type checking if the user uses the interface block. I suggested that this be extended to include the alignment attribute in addition to type, rank, etc...
I think the modifications I suggested would be integratible into ifortrelatively easily. Everything is in place to do this type of test.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jim,
The Intel compilers use rather advanced methods to propagate alignment information within and across subroutines (see Chapter 6 of the Software Vectorization Handbook, more information at http://www.intel.com/intelpress/sum_vmmx.htm). Unaligned or unknown alignments are still optimized using static and dynamic methods. The assertions are provided to avoid the overhead of the dynamic methods for situations where the programmer knowsmore about the alignment than the compiler.
If I understand you correctly, you want to make alignment part of the type system, i.e. something that is statically enforced (so calls to aligned arguments with unaligned arrays are rejected). Although interesting, such a solution seems rather intrusive for the programmer (I already encounter resistance if I ask customers to add one pragma to their code). Furthermore, programs that are provably type-correct (using your definition) are exactly the programs for which the compiler alreadypropagates exact alignment information using the existing methodology and a sufficiently large compilation scope (and, hence, need no further annotation). The only added value I see with your proposal is when modules are compiled in strict isolation, but I am not sure if this warrants such an elaborate extension (requiring cooperation between the language specs, compiler, linker and acceptance by sufficient programmers). But, the floor is open for input from others..
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 10-28-2005 01:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Message Edited by tim18 on 10-29-2005 07:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
Please advise our customers correctly! The aligned annotation has *no* impact onassumed data dependences that prevent (unconditional) vectorization at all. So, the code:
!DIR$ VECTOR ALIGNED
do i = 1, n
a(i) = a(i+x) + 10
enddo
Still assumes loop-carried flow dependences. To do what you allude to, one would have to use the following annotations:
!DIR$ IVDEP
!DIR$ VECTOR ALIGNED
do i = 1, n
a(i) = a(i+x) + 10
enddo
end
Furthermore, Jims suggestion is to make alignment properties part of the type system itself.
Aart Bik
http://www.aartbik.com/
Message Edited by abik on 10-28-2005 02:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
integer(4) :: ivar
real(4) :: fvar
real(8) :: dvar
Alignment requiremets can be mangled in there as well:
subroutine foo(ivar,fvar,dvar)
integer(4) :: ivar
!dec$ attributes align:16 :: fvar // proposed extension
real(4) :: fvar(ivar)
!dec$ attributes align:16 :: dvar // proposed extension
real(8) :: dvar
This is not specified. I've tried !DEC$ VECTOR ALIGNED but without certain success. I get the impression from looking at the code that the alignment is assumed to be at 8 not 16. This means the SSE3 instructions are not optimal. What is needed is
...
array[index] = var;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you know that all arrays which are written into are aligned, and had a way to assert that other arrays will not be aligned, you could save generation of unneeded code by your data alignment assertions. Certainly, in the case where an array is declared aligned, but the loop starts at a non-aligned interval from the start, your proposal could do the job.
Aart and Steve have scolded me before for working so much on old-fashioned code where there is much built-in information on alignment, much of which Aart has taught the compiler to take advantage of. I'm not meaning to argue much against this, but I'm somewhat skeptical of general acceptance of more specialized optimization stuff like what has been added to C. AMD waged a campaign to have other brands of Fortran include low-level SSE intrinsics, which came originally from Intel C. They weren't nearly as popular as auto-vectorization, so now all the compilers have copied from Aart's work.
Intel C people used to argue that all data should be declared with declspec, all malloc() replaced with non-standard functions, and whole program optimization used, if efficient alignments are wanted. They even turned down requests to have default minimum 64-bit alignment for 64-bit data. I'll admit my gratitude that Fortran isn't likely to adopt such policy, and try to stop making noise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ultimate goal is to write source code with high performance that is invariant with compiler. This is now possible with IVF: even old fashioned Fortran-77 code vectorizes neatly (data were usually stored in a cache friendly unit stride fashion back in the seventies).
Adding many #pragma and !DEC statements everywhere makes the code dependent on one particular compiler and the readability of the code is usually reduced. Instead, all the automatic features of IVF, like automatic vectorization and automatic parallellization makes the life much easier for the programmer. (I am waiting for more automatic features in the future! :-).
If performance is crucial, I recommend to put all the functions that belong together in one source code file and compile with the options "/O3 /Qip /QxN /Qprof_use". Then allignement issues are not seen, and both compilation and program execution time is reduced.
Best Regards,
Lars Petter
Message Edited by lpe@scandpower.no on 10-31-2005 01:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Correct me if I am wrong...
As it stands now, all subroutines and functions (that are not inlined) have entry points that visible by the linker. As such, the compiler cannot know the particulars of all potential callers to the function or subroutine. Therefore the compiler is left with no choice to generate code that is not sensitive to alignment issues i.e. has overhead to workout the alignment prior to taking full advantage of SSEn instructions. For routines that are relatively small the routine can be forced inline and thus brought into the scope of the caller and as a result the alignment of the arguments can be taken into consideration during code generation. If the routine is not inlined then it has the potential of being called from a context where the alignment is unknown and thus the synchronization code is generated for the routine and executed upon call.
A potential comprimize to this quandary would be to generate twoor moreentry points into the routine. Such that when the compiler has access to the routine being called as well as the caller and if the alignments meet the alignment requirements of the routine then the compiler outputs a call to the alternate entry point that bypasses the test and synchronization code. This method would require no !DEC$ (other than for that to align the variables for allocation or static placement.).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
You writeabout a"non-neglegable amount of overhead in determining alignment issues". In terms of CPU time, what is this overhead relative to the total time the program takes?
Lars Petter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jim,
>I just want to be able to tell IVF "this data is aligned" but I cannot.
The loop-oriented !DIR$ VECTOR ALIGNED and the upcoming data-oriented ASSUMED_ALIGNED annotations should suffice to accomplish just this. Vector code in such cases will be optimal. Your initial proposal went a lot further than just the objective you mention above, however.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Will the ASSUME_ALIGNED be specified per argument? (And again aligned to what 2, 4, 8, 16, 32, ...)
If not then the programmer cannot use the feature if they know not all arguments are aligned. It is not unusual to call a major routine where you pass in 10 or so arguments. Some of which are aligned some are not.
What is the internal argument against permiting "!DEC$ attributes align: 16 :: var" on dummy arguments to a function or subroutine?
When the !DEC$alignment is used on static, automatic and allocatable data it does two things: a) aligns the data upon allocation/instansiation and b) enters an attribute into the compiler symbol table the alignment attribute. What is wrong with extending the !DEC$alignmentto dummy variables whereby you perform only step b) above? (this would eliminate the vague ASSUME_ALIGNED)
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jim,
>Will the ASSUME_ALIGNED be specified per argument? (And again aligned to what 2, 4, 8, 16, 32, ...)
Of course! The idea is that something like
subroutine aart(p1, p2)
real p1(*), p2(*)
!DIR$ ASSUME_ALIGNED(p1,16)
!DIR$ ASSUME_ALIGNED(p2, 4)
.
end
provides a fine-grained method of conveying alignment information to the compiler (which analyzes the rest of the subroutine based on this per-argument information, much more flexible than the current loop-oriented alignment annotation). I even advocate more elaborate constructs like
subroutine aart(p1, p2)
real p1(*), p2(*)
!DIR$ ASSUME_ALIGNED(p1(1),16)
!DIR$ ASSUME_ALIGNED(p2(2),16)
.
end
which would indicate that array p1 starts 16-byte aligned (as above), and array p2 starts at an address a such that a mod 16 = 12.
Aart Bik
http://www.aartbik.com/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Aart,
One more point
!DEC$ vector aligned
as well as
ASSUME_ALIGNED
If following FORTRAN standards will have to mean the data is aligned to natural boundaries. This means real(4) is to multiple of 4 and real(8) is multiple of 8. SIMD requires alignment to multiple of 16 bytes (at least for now and later it might be 32).
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I like the idea of specifying the additional alignment information. The programmer can probably work around not having it.
call Aart(array(22), array(17)) ! always even, always odd
Could be replaced with
call Aart(array(22), array(16)) ! always even, always one before the odd
Then the user would +1 the indexes into the second array
When will this be available?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It could tentatively go into a 9.0 updateas an undocumented feature. Hide it with a switch if need be. In this manner the production code can be used for testing by the few in the know of the hidden feature.
Jim
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page