VHPT Walker

Adam · ‎03-04-2009

Hello. I was wondering whether the IPF documentation has been updated for recent Itanium processors. The latest I seem to find is some 2006 revision! Now in particular I just so happen to notice that Volume 2 (System Architecture) mentions that the VHPT Walker is optional.

Now I have been trying to search this forum for a while and couldn't find a single thread on this one. Is the VHPT walker still "optional"? And if so, what model(s) of the IPF do not have the VHPT Walker?

I am developing an OS and I would like to know this because it would save me a lot of time and effort using the built-in VHPT walker rather than ending up writing slow and buggy code to walk the VHPT - the more hardware I can use the better.

NOTE: So far I don't have any VHPT walking code - I am going to do this if it is actually worth doing.

And it really does puzzle me why such a significant optimization could be made optional in processors!

Let me know if I am "reading things"

Now I am not entirely sure how "on-topic" this is, but it is hard to find the appropriate Itanium resources (other than the Software Developer's Manuals) - such as the Itanium Assembler user's guides, an updated version of the Itanium Assembler (the last I found was the source which was made in 2002), the IA-64 Assembly Assistant, and several other things. I did eventually find the IA-64 Assembler guide but not the other two.

Of course, the guide is also a little out of date, dating back to around the Year 2000! So this means that the new format may not be accompanied for (Itanium has evolved a lot since 2000!)

Now also about the VHPT: does each core share one VHPT walker? Or does each core get a completely separate set of resources (including cache)?

David_S_Intel3 · ‎03-04-2009

Quoting - Adam Kachwalla

Hello. I was wondering whether the IPF documentation has been updated for recent Itanium processors. The latest I seem to find is some 2006 revision! Now in particular I just so happen to notice that Volume 2 (System Architecture) mentions that the VHPT Walker is optional.

Now I have been trying to search this forum for a while and couldn't find a single thread on this one. Is the VHPT walker still "optional"? And if so, what model(s) of the IPF do not have the VHPT Walker?

I am developing an OS and I would like to know this because it would save me a lot of time and effort using the built-in VHPT walker rather than ending up writing slow and buggy code to walk the VHPT - the more hardware I can use the better.

NOTE: So far I don't have any VHPT walking code - I am going to do this if it is actually worth doing.

And it really does puzzle me why such a significant optimization could be made optional in processors!

Let me know if I am "reading things"

Now I am not entirely sure how "on-topic" this is, but it is hard to find the appropriate Itanium resources (other than the Software Developer's Manuals) - such as the Itanium Assembler user's guides, an updated version of the Itanium Assembler (the last I found was the source which was made in 2002), the IA-64 Assembly Assistant, and several other things. I did eventually find the IA-64 Assembler guide but not the other two.

Of course, the guide is also a little out of date, dating back to around the Year 2000! So this means that the new format may not be accompanied for (Itanium has evolved a lot since 2000!)

Now also about the VHPT: does each core share one VHPT walker? Or does each core get a completely separate set of resources (including cache)?

Hi Adam,

All Itanium processors releasedto date have aVHPT walker, with the VHPT walker attached to the highest level TLB. On the dual-core Itanium processors (code-named Montecito and Montvale), thehighest level TLB and VHPT walker are shared between the cores.

Note that the operating system is still required to handle TLB Miss faults, since the architecture doesn't guarantee that the processor will install a VHPT translation, even if the translation is available in the VHPT.

The VHPT remains an optional feature, since its presence isn't needed for architecturally-compliantsoftware to run correctly. Philosophically, the Itanium architecture avoids specifying any requirements that aren't needed for correct software execution. Another example is thatthe Itanium SDM doesn't require that processors implementcaches. Obviously, you're not going to get good performance without caches (or the VHPT walker), but the architecture doesn't try to predetermine what implementation features are needed for performance.

For Itanium architecturedocumentation, please refer to the "Intel Itanium Software Developer's Manual", Revision 2.2 + the "Intel Itanium Software Developer's Manual Specification Update", available at http://download.intel.com/design/itanium/specupdt/248699.pdf.

Implementation-specific information about the dual-core Itanium processors is available at: http://download.intel.com/design/Itanium2/manuals/25111003.pdf. We'll be releasing the equivalent document for Tukwila later this year.

Updated IA-64 assember information is available at: http://www.intel.com/design/Itanium/arch_spec.htm

David

Adam · ‎03-07-2009

Quoting - David Song (Intel)

Quoting - Adam Kachwalla

Hello. I was wondering whether the IPF documentation has been updated for recent Itanium processors. The latest I seem to find is some 2006 revision! Now in particular I just so happen to notice that Volume 2 (System Architecture) mentions that the VHPT Walker is optional.

Now I have been trying to search this forum for a while and couldn't find a single thread on this one. Is the VHPT walker still "optional"? And if so, what model(s) of the IPF do not have the VHPT Walker?

I am developing an OS and I would like to know this because it would save me a lot of time and effort using the built-in VHPT walker rather than ending up writing slow and buggy code to walk the VHPT - the more hardware I can use the better.

NOTE: So far I don't have any VHPT walking code - I am going to do this if it is actually worth doing.

And it really does puzzle me why such a significant optimization could be made optional in processors!

Let me know if I am "reading things"

Now I am not entirely sure how "on-topic" this is, but it is hard to find the appropriate Itanium resources (other than the Software Developer's Manuals) - such as the Itanium Assembler user's guides, an updated version of the Itanium Assembler (the last I found was the source which was made in 2002), the IA-64 Assembly Assistant, and several other things. I did eventually find the IA-64 Assembler guide but not the other two.

Of course, the guide is also a little out of date, dating back to around the Year 2000! So this means that the new format may not be accompanied for (Itanium has evolved a lot since 2000!)

Now also about the VHPT: does each core share one VHPT walker? Or does each core get a completely separate set of resources (including cache)?

Hi Adam,

All Itanium processors releasedto date have aVHPT walker, with the VHPT walker attached to the highest level TLB. On the dual-core Itanium processors (code-named Montecito and Montvale), thehighest level TLB and VHPT walker are shared between the cores.

Note that the operating system is still required to handle TLB Miss faults, since the architecture doesn't guarantee that the processor will install a VHPT translation, even if the translation is available in the VHPT.

The VHPT remains an optional feature, since its presence isn't needed for architecturally-compliantsoftware to run correctly. Philosophically, the Itanium architecture avoids specifying any requirements that aren't needed for correct software execution. Another example is thatthe Itanium SDM doesn't require that processors implementcaches. Obviously, you're not going to get good performance without caches (or the VHPT walker), but the architecture doesn't try to predetermine what implementation features are needed for performance.

For Itanium architecturedocumentation, please refer to the "Intel Itanium Software Developer's Manual", Revision 2.2 + the "Intel Itanium Software Developer's Manual Specification Update", available at http://download.intel.com/design/itanium/specupdt/248699.pdf.

Implementation-specific information about the dual-core Itanium processors is available at: http://download.intel.com/design/Itanium2/manuals/25111003.pdf. We'll be releasing the equivalent document for Tukwila later this year.

Updated IA-64 assember information is available at: http://www.intel.com/design/Itanium/arch_spec.htm

David

Thanks for reminding me about the TLB miss faults, David.

However, the link you provided in the last post for the dual-core Itanium processors seems to be out of date (dates all the way back to 2004) and does not contain any information regarding dual core Itanium processors. Are you sure that is the right link?

And also you mention releasing documentation for Tukwila this year. I assume you are talking about when Tukwila is released (hopefully it will not be delayed again! I have been waiting eagerly for it since 2004!)

Now I do not understand the following segment:

...but the architecture doesn't try to predetermine what implementation features are needed for performance.

Can you explain this to me? Are you saying that anything to do with performance is not required in the IA-64 architecture, in which case the RSE, software pipelining, branch prediction and maybe even predication can be optional too, effectively turning the IA-64 architecture into a crippled version of the x86 architecture!

David_S_Intel3 · ‎03-09-2009

Quoting - Adam Kachwalla

Thanks for reminding me about the TLB miss faults, David.

However, the link you provided in the last post for the dual-core Itanium processors seems to be out of date (dates all the way back to 2004) and does not contain any information regarding dual core Itanium processors. Are you sure that is the right link?

And also you mention releasing documentation for Tukwila this year. I assume you are talking about when Tukwila is released (hopefully it will not be delayed again! I have been waiting eagerly for it since 2004!)

Now I do not understand the following segment:

...but the architecture doesn't try to predetermine what implementation features are needed for performance.

Can you explain this to me? Are you saying that anything to do with performance is not required in the IA-64 architecture, in which case the RSE, software pipelining, branch prediction and maybe even predication can be optional too, effectively turning the IA-64 architecture into a crippled version of the x86 architecture!

Hi Adam,

Sorry for the incorrect link. The dual-core Itanium processor reference manual is at http://download.intel.com/design/Itanium2/manuals/30806501.pdf.

We plan on releasing the Tukwila reference manual prior to product launch to support open source development, but I don't have a reference manual release date yet - sorry.

Regarding performance optimizations, processor implementations are required to be compliant with the Itanium Architecture Software Developer's Manual. Your software can assume that features required by the SDM will be avaialable in all Itanium processor implementations. For example, all Itaniumprocessors must implement the RSE and predication - otherwise architecturally compliant software wouldn't be run.

But, the Itanium SDM tries to provide as much latitude as possible to the hardware designers and avoids specifying which performance features must be implemented in hardware. For example, consider the following:

Caches aren't required by the architecture. Software can query the processor'scache implementation and provide cache hints, but the processor may evict cache lines at will and software cannot expect that a cache line will remain resident in the cache.

The VHPT walker isn't required by the architecture. Software is expected to be able to handle any TLB miss faults the hardware chooses not to handle.

The ALAT isn't required by the architecture. Software is expected to be able to handle any ALAT miss that occurs.

In practice, all of these optional hardware features are in fact implemented in current Itanium processors and software performance will benefit from these hardware hardware accelerations. However, software must be compliant with the SDM and must be able to handle cache line evictions, TLB miss faults, ALAT misses, etc.

David

Adam · ‎03-09-2009

Quoting - David Song (Intel)

Quoting - Adam Kachwalla

Thanks for reminding me about the TLB miss faults, David.

However, the link you provided in the last post for the dual-core Itanium processors seems to be out of date (dates all the way back to 2004) and does not contain any information regarding dual core Itanium processors. Are you sure that is the right link?

And also you mention releasing documentation for Tukwila this year. I assume you are talking about when Tukwila is released (hopefully it will not be delayed again! I have been waiting eagerly for it since 2004!)

Now I do not understand the following segment:

...but the architecture doesn't try to predetermine what implementation features are needed for performance.

Can you explain this to me? Are you saying that anything to do with performance is not required in the IA-64 architecture, in which case the RSE, software pipelining, branch prediction and maybe even predication can be optional too, effectively turning the IA-64 architecture into a crippled version of the x86 architecture!

Hi Adam,

Sorry for the incorrect link. The dual-core Itanium processor reference manual is at http://download.intel.com/design/Itanium2/manuals/30806501.pdf.

We plan on releasing the Tukwila reference manual prior to product launch to support open source development, but I don't have a reference manual release date yet - sorry.

Regarding performance optimizations, processor implementations are required to be compliant with the Itanium Architecture Software Developer's Manual. Your software can assume that features required by the SDM will be avaialable in all Itanium processor implementations. For example, all Itaniumprocessors must implement the RSE and predication - otherwise architecturally compliant software wouldn't be run.

But, the Itanium SDM tries to provide as much latitude as possible to the hardware designers and avoids specifying which performance features must be implemented in hardware. For example, consider the following:

Caches aren't required by the architecture. Software can query the processor'scache implementation and provide cache hints, but the processor may evict cache lines at will and software cannot expect that a cache line will remain resident in the cache.

The VHPT walker isn't required by the architecture. Software is expected to be able to handle any TLB miss faults the hardware chooses not to handle.

The ALAT isn't required by the architecture. Software is expected to be able to handle any ALAT miss that occurs.

In practice, all of these optional hardware features are in fact implemented in current Itanium processors and software performance will benefit from these hardware hardware accelerations. However, software must be compliant with the SDM and must be able to handle cache line evictions, TLB miss faults, ALAT misses, etc.

David

Thanks for the new link. Does this apply to current dual core Itanium processors such as Montvale (with the new power saving capabilities etcetera)?

Now about the Tukwila documentation - I presume that means that the processor design will not be tweaked/modified (unless a bug is found or something) - releasing the documentation before releasing the product effectively means you cannot make such changes without updating the documentation and possibly breaking existing software!

Now as for the performance features; what registers would give me the info as to whether or not those are enabled?

I presume all those TLB miss faults, ALAT miss faults, and cache line evictions can be handled using interrupt handlers, right? Is there a better way of handling them (Such branching requires the pipeline to be flushed, etc and this can severely endanger performance) - can I use gate page code? This will hopefully not result in too much of a performance penalty.

David_S_Intel3 · ‎03-10-2009

Quoting - Adam Kachwalla

Thanks for the new link. Does this apply to current dual core Itanium processors such as Montvale (with the new power saving capabilities etcetera)?

Now about the Tukwila documentation - I presume that means that the processor design will not be tweaked/modified (unless a bug is found or something) - releasing the documentation before releasing the product effectively means you cannot make such changes without updating the documentation and possibly breaking existing software!

Now as for the performance features; what registers would give me the info as to whether or not those are enabled?

I presume all those TLB miss faults, ALAT miss faults, and cache line evictions can be handled using interrupt handlers, right? Is there a better way of handling them (Such branching requires the pipeline to be flushed, etc and this can severely endanger performance) - can I use gate page code? This will hopefully not result in too much of a performance penalty.

Hi Adam,

The Dual-Core Update to the Itanium 2 Processor Reference Manual applies to the processors code-named Montecito and Montvale.

Intel reserves the right to change product specifications and documentation. We do publish errata and update specifications. We'd obviously are very unlikely to make any product changes that would create compatibility problems with existing architecturally compliant code.

PAL_CACHE_INFO provides information about the processor caches. If a cache miss occurs, the processor will initiate a load from memory. Software doesn't have any responsibilities for a cache miss and will only see a longer latency.

We don't have any mechanism to allow software to determine if a VHPT walker is available. All Itanium processor implementations to date have implemented the VHPT walker. Even if the VHPT walker is available, the OS is still expected to handle any TLB miss fault.

On ALAT miss, application code is expected to execute recovery code to repeat the speculated instructions. Your compiler will automatically generate recovery code when it performs data speculation. if you're boosting loads in hand-written assembly, you're expected to provide the recovery code.

OurIA-64 Linux maintainer, Tony Luck, provided the following feeback on your gate page question:

"Linux has just one gate page that it uses for system call privilege escalation. On recent kernels this page is
mapped at 0xA0000000000010000 with an execute-only-raise-priv-to-cpl0 TLB entry. Applications
can make system calls by branching to this page where they execute the "EPC" instruction that
will change privilege level from user mode to kernel mode.

"All the IA-64 fault handlers have to be in the 32K block of memory pointed to by CR.IVA. PSR.it is unchanged
when an exception happens, so for most (all?) operating systems the table address will be treated as a virtual
address, thus to ensure that there is a translation for it, a TR entry must be allocated in the TLB. I suppose
that TLB entry *could" mark this page as a gate page ... but since a gate page can only raise privileges to
higher (lower numerical) values, and the handler is already executing at the highest privilege level (PSR.cpl
is cleared before starting execution of the handler) ... there would seem to be no point in making the page
a gate page."

David