Intel® Software Guard Extensions (Intel® SGX)
Discussion board focused on hardware-based isolation and memory encryption to provide extended code protection in solutions.
1452 Discussions

Mysterious SGX_ERROR_ENCLAVE_LOST bug when running enclave in multi-threaded mode

123hiroki
Novice
1,129 Views

Hi,

 

Recently I encountered a very strange problem when I tried to run the enclave in multi-threaded mode using my 3rd scalable Xeon CPU. When launching the enclave compiled from the sample code (namely, linux-sgx/SampleCode/PowerTransition), the error says:

Thread 0x8f8a648c>: 5115
power transition occured in increase_and_seal_data().
Thread 0xd8ff8ba1>: 5116
Thread 0x533c5a70>: 5117
Error: Unexpected error occurred.
[1]    20699 abort (core dumped)  ./app

 

The weird point is that the error is not 100% reproducible as it occurred randomly, so I had no clue about what was going on with that. One thing I am sure about is that the error is likely to be triggered if the system is undergoing a heavy workload.

 

My machine's spec is:

  • Processor: Dual Intel(R) Xeon(R) Gold 5318S CPU @ 2.10GHz (24 cores, 48 threads)
  • RAM: 224 GiB 3200 MHz DDR4 RAM (Micron)
  • OS: Ubuntu 20.04 LTS with kernel version 5.15.112-0515112-generic
  • SDK: 2.17.1 (the error occurred when I used SDK v2.19)

 

The error, as per the source file, is due to a power transition or Linux fork. However, I disabled power management in my system, and I also checked there was no fork syscall. This error did not disappear; therefore I am wondering what the root cause is.

 

Thank you in advance.

1 Solution
Wan_Intel
Moderator
977 Views

Hi 123hiroki,

Thanks for your patience.

 

Seems that EPC does have an effect because the error was eliminated when EPC increased to 32GB. It's rare that SGX_ERROR_ENCLAVE_LOST is observed when a heavy workload is performed within Intel® SGX.

 

For your information, Intel® Xeon® Gold 5318S Processor supports Maximum Enclave Page Cache (EPC) up to 512GB. Please refer to the following links for more information.

 

Referring to Power Transitions in Intel SGX applications for windows:

 

"Modern operating systems provide mechanisms to enable applications to be notified of major power events on the platform. When the computer enters a lower power state, the OS suspends to RAM or saves to disk context information for future restoration.

 

For Intel SGX, power transitions from an S0/S1 state to an S2-S5 state cause the protected memory encryption key for an enclave to be destroyed. This makes the enclave effectively unreadable; therefore, it must be recreated on a system resume."

  

"

Upon re-instantiation of the application, enclaves are subsequently rebuilt from scratch.

Applications must retrieve their protected states from the disk or cloud. To minimize the

overhead caused by constantly sealing secrets and storing the encrypted data to a disk or cloud, the enclave writer should design their application enclave to keep as little state

information as possible inside the enclave so that the application can effectively manage a

power-transition event

"

 

We strongly believe that the error was caused by application overhead. Referring to your previous post, the issue is not 100% reproducible and it happens randomly. Therefore, we believe this happens when the system is too busy.

 

Please ensure the enclave size is less than the EPC available, and include only secrets and code to operate inside the enclaves. Less elements within the enclave mean less encryption or decryption and less data structure checking by the Intel® SGX memory control/protection mechanism. This will minimize the chances of application overhead that cause the error.

 

Hope it helps.

 

 

Regards,

Wan


View solution in original post

0 Kudos
8 Replies
Wan_Intel
Moderator
1,101 Views

Hi 123hiroki,

Thanks for reaching out to us.

 

We noticed that you encountered "SGX_ERROR_ENCLAVE_LOST" when using Intel SGX for Linux version 2.19.

 

Do you encounter "SGX_ERROR_ENCLAVE_LOST" when using Intel SGX for Linux version 2.17.1?

 

Regards,

Wan


0 Kudos
123hiroki
Novice
1,097 Views

Yes, this error persisted when using SDK v2.17.1

Wan_Intel
Moderator
1,090 Views

Hi 123hiroki,

Thanks for your information.

Let us investigate this issue with our next level, and we'll update you as soon as possible.



Regards,

Wan





123hiroki
Novice
1,081 Views

Thank you for your help, Wan!

 

I've got some additional information that might be helpful for you. The error was eliminated if I increased the EPC size (16 GB -> 32GB), although the error seemed irrelevant to the EPC size.

 

Looking forward to hearing from you soon

 

Wan_Intel
Moderator
1,080 Views

Hi 123hiroki,

Thanks for your additional information.

I've shared the additional information with our next level, and we will further investigate the issue.



Regards,

Wan



0 Kudos
Wan_Intel
Moderator
978 Views

Hi 123hiroki,

Thanks for your patience.

 

Seems that EPC does have an effect because the error was eliminated when EPC increased to 32GB. It's rare that SGX_ERROR_ENCLAVE_LOST is observed when a heavy workload is performed within Intel® SGX.

 

For your information, Intel® Xeon® Gold 5318S Processor supports Maximum Enclave Page Cache (EPC) up to 512GB. Please refer to the following links for more information.

 

Referring to Power Transitions in Intel SGX applications for windows:

 

"Modern operating systems provide mechanisms to enable applications to be notified of major power events on the platform. When the computer enters a lower power state, the OS suspends to RAM or saves to disk context information for future restoration.

 

For Intel SGX, power transitions from an S0/S1 state to an S2-S5 state cause the protected memory encryption key for an enclave to be destroyed. This makes the enclave effectively unreadable; therefore, it must be recreated on a system resume."

  

"

Upon re-instantiation of the application, enclaves are subsequently rebuilt from scratch.

Applications must retrieve their protected states from the disk or cloud. To minimize the

overhead caused by constantly sealing secrets and storing the encrypted data to a disk or cloud, the enclave writer should design their application enclave to keep as little state

information as possible inside the enclave so that the application can effectively manage a

power-transition event

"

 

We strongly believe that the error was caused by application overhead. Referring to your previous post, the issue is not 100% reproducible and it happens randomly. Therefore, we believe this happens when the system is too busy.

 

Please ensure the enclave size is less than the EPC available, and include only secrets and code to operate inside the enclaves. Less elements within the enclave mean less encryption or decryption and less data structure checking by the Intel® SGX memory control/protection mechanism. This will minimize the chances of application overhead that cause the error.

 

Hope it helps.

 

 

Regards,

Wan


0 Kudos
Wan_Intel
Moderator
944 Views

Hi 123hiroki,

Thanks for your question.

Please submit a new question if additional information is needed as this thread will no longer be monitored.

 

 

Regards,

Wan


0 Kudos
Reply