RSE-Based Context Switch

Adam · ‎12-02-2008

Hello. I am developing an operating system for Intel IPF. I am having a few confusions about the context switching. Is it possible to use RSE to do context switching or do I have to save all the registers by hand?

Also is using the RSE going to be any more efficient than save-by-hand in multicore Itanium?

And will my existing save-by-hand code have any compatibility issues with future processors and future apps (eg should registers be added and/or removed for example)

Terence_S_Intel · ‎01-16-2009

Quoting - Adam Kachwalla

Hello. I am developing an operating system for Intel IPF. I am having a few confusions about the context switching. Is it possible to use RSE to do context switching or do I have to save all the registers by hand?

Also is using the RSE going to be any more efficient than save-by-hand in multicore Itanium?

And will my existing save-by-hand code have any compatibility issues with future processors and future apps (eg should registers be added and/or removed for example)

The RSEis intended to beused for call/return processing, not context switches.

For context switching, I wouldfollow the recommendations in the Itanium Software Conventionsand Runtime Architecture manual:
http://www.intel.com/design/itanium/documentation.htm?=Itanium+tab_technical_docs

Architecture changes can happen, but they are infrequent.

Adam · ‎01-30-2009

Quoting - Terence Sych (Intel)

The RSEis intended to beused for call/return processing, not context switches.

For context switching, I wouldfollow the recommendations in the Itanium Software Conventionsand Runtime Architecture manual:
http://www.intel.com/design/itanium/documentation.htm?=Itanium+tab_technical_docs

Architecture changes can happen, but they are infrequent.

Hello Terence! Great to see you here again! Sorry for the long delay between replies. The Itanium Software Conventions and Runtime Architecture Manual states to switch registers by hand as well. But what would happen if more registers are added to future processors? It may not happen because there are already quite a huge number.

Also, these are the steps I am taking so far (This is after the gate page has been initialized with epc and all that):

For saving the context:

Switch to Bank #0
Get the process head for the next process from the first region (I'm using IA-64 memory region system) - using R16-R31 of course
Spill the first 15 registers, R1-R15 (R0 is always 0 for some reason) into offsets specified by R1-R15, and add 120 to each offset after the operation is complete for the future. so they can be used for offset calculations.
Copy R16-R30 into respective R1-R15 (so R1=R16, R2=R17, R3=R18, etc)
Switch to Bank #1 and expose the original set R16-R31 to save
Spill R16-R30 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Spill R31-R45 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Spill R46-R60 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Spill R61-R64 into offsets given by R1-R4 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Perform instruction serialization (srlz.i)
Move UNAT into ar.k6 (temporary register)
Spill R65-R75 into offsets given by R5-R15, and add 120 to each offset after the operation is complete for the future.
Spill R76-R90 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Spill R91-R105 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Spill R106-R120 into offsets given by R1-R15, and add 120 to each offset after the operation is complete for the future.
Spill R121-R127 into offsets given by R1-R7, and add 120 to each offset after the operation is complete for the future.

Restoring the context comes after serialization of instructions and data. For restoring the suspended context basically the registers are filled in the same order as above.

Then IIP and privelege return register is restored from the store and rfi is called to return to that pointer set by IIP.

I was investigating in the RSE because it basically did the spilling and filling for me. That is why I was thinking of using RSE instead. I was also under the impression that using the RSE would take up much less code and would be more efficient, due to the fact that it is entirely hardware-based, and also due to the fact that the instructions don't have to be fetched everytime a context switch occurs.

The routine above is actually code that I am putting in the "gate page". If a timer interrupt is called, it will redirect to the gate page.

Also if a task needs to wait for I/O or some other thing (or even just because it no longer needs the CPU or it is waiting for a condition to be met) it can branch over to the gate page and then the context switching takes place at the gate page.

Of course I am currently under the impression as well that once the gate page has been left, the privelege level is set back to normal according to the privilege return register.