Solved: Default value of retries_before_fallback in the sgx_uswitchless_config_t struct of the SDK

MPaper · ‎06-04-2021

Hi everyone,

While working with SGX and switchless calls I noticed that SL_DEFAULT_FALLBACK_RETRIES is set to 20000 in linux-sgx/common/inc/sgx_uswitchless.h (it has been so in every version of the SDK).

However the overhead of normal OCalls is about 8000 CPU cycles according to the paper "Switchless Calls Made Practical in Intel SGX".

This means that, by keeping this default value, a thread in the enclave that tries to do a switchless call might have more than 3 times more overhead than if it had done the OCall by itself.

I found out that this default parameter causes applications that do lots of long switchless OCalls to have very bad performances compared to the performances they have when switchless is disabled.

So is there a reason why 20000 was chosen as a default value ?

Thanks in advance !

JesusG_Intel · ‎06-04-2021

Hello MPaper,

We don't know why the default of 20000, specifically, was chosen. We are looking into how many cycles are spent in those 20,000 retry attempts.

It is important to remember that the Developer Reference warns enclave developers that tuning is required for each workload:

PERFORMANCE NOTE:

Switchless Calls is an advanced feature. It requires additional worker threads and configuration, performance testing and tuning. It should be used for workloads that require fine performance tuning. Misconfiguration may result in underutilized worker threads, which consumes CPU time while not serving any tasks.

At first look, such a large “retries_before_fallback” doesn’t appear to make sense. Waiting to make a Switchless OCALL for such a long period of time rather than doing a regular OCALL seems wasteful.

However, there is also an impact from making a regular OCALL. For instance, if a trusted worker thread falls back and does a “long” OCALL, it won’t be available for handling Switchless ECALLs.

The SL_DEFAULT_FALLBACK_RETRIES parameter is likely not the only culprit of bad Switchless performance.

If a trusted thread is waiting to make an OCALL for so long, something else needs tuning; maybe increasing the number of untrusted worker threads.

Also, look at the number of times untrusted threads need to be wakened and adjust “retries_before_sleep" accordingly.

Lowering “retries_before_fallback” would be the right approach if you don't want to create additional threads, and don’t mind making regular OCALLs.

Ultimately, the decision is workload-specific, that’s why tuning is always needed.

Sincerely,

Jesus G.

Intel Customer Support

View solution in original post

JesusG_Intel · ‎06-04-2021

Hello MPaper,

We are checking with engineering. I will respond on this thread as soon as I have an answer for you.

Sincerely,

Jesus G.

Intel Customer Support

JesusG_Intel · ‎06-04-2021