Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
676 Discussions

Confidential Federated Learning with OpenFL

Teodor_Parvanov_Intel
1 0 5,181

Co-authored by Teodor Parvanov, Senior Software Architect for Intel Tiber Trust Services, and Karan Shah, Federated Learning Research Engineer

 

Federated Learning (FL) is an emerging paradigm in Machine Learning (ML) that allows model builders to use datasets distributed across different data owners. By enabling data owners to remain in full control of their data, FL helps unlock the vast potential of private data for model training and evaluation. This ultimately leads to more robust and trustworthy models, derived by leveraging sensitive and highly regulated datasets without compromising their integrity or confidentiality.

Joel_Dippold_0-1737588227953.png

Figure 1: From Centralized to Federated Learning

 

In a previous article we walked through the process of setting up a Federated Learning experiment with OpenFL*, starting from a traditional centralized training setting. Although based on a robust PKI setup and mutually authenticated communication between the Aggregator and Collaborator nodes (via mTLS), participants are still vulnerable to attacks exploiting the distributed nature of this system, or through algorithmic adversarial attacks.

In this article, we will provide a few building blocks and recipes to enhance OpenFL-based federations for greater privacy and security. Specifically, we will explore the use of Trusted Execution Environments (TEEs) and Remote Attestation to create a more secure and reliable federated learning system.

Attack Vectors on Federated Learning

As part of setting up a federation, data owners (collaborators) are expected to review and agree on the pipeline and algorithms to be run on their private data. This includes checks that the training script would not attempt to exfiltrate data or use it in an unethical or biased way. The data owner assumes that the code executed on their data during the FL process is the same as the code they have approved. Such guarantees, however, cannot be provided by typical execution environments, on-premises or in the cloud. Data owners thus need to implicitly trust every person or organization involved in setting up the Federated Learning experiment.

The model builder is also exposed to certain threats — either related to the quality of the training process, or the intellectual property of the final model itself. Indeed, a malicious collaborator could attempt a model poisoning attack by changing the ML code intended by the model owner, potentially affecting the convergence rate or even introducing some form of bias. In other scenarios, a dishonest collaborator could try to steal the model itself by inspecting unencrypted memory or files produced by the ML algorithms. This threat jeopardizes the intellectual property of the final model by allowing unauthorized individuals to replicate or misuse it.

Although there are software-based methods to protect FL experiments from those vulnerabilities, elevated security requires either implicit trust between participating entities or an explicit form of integrity and confidentiality check of the code. However, as federations grow in number and scale, relying on human interactions and sound judgment alone to help ensure trust in federated learning becomes impractical.

Trusted Execution Environments

When considering trust in the context of Federated Learning, we usually mean helping to protect sensitive data when stored, while in transit over networks, and during processing. For the first two cases, we have well-known software-based solutions involving proven cryptographic techniques (including the mTLS protocol, the default for secure communication within OpenFL federations). On the other hand, data in use is vulnerable to introspection by malicious entities such as rogue processes with high privileges, operating systems, virtualization hypervisors, or even cloud infrastructure administrators. Although advanced software techniques such as homomorphic encryption exist, they come with prohibitively high complexity and computational overhead. In this article, we are going to focus on a hardware-based alternative called Trusted Execution Environment (TEE) that enables a wide variety of Confidential Computing use cases with minimal performance penalties.

A TEE is an increasingly secure area within a processor, designed to protect the confidentiality and integrity of the code and data loaded inside. TEEs provide a controlled execution environment that isolates sensitive computations from the rest of the system. TEEs additionally support a Remote Attestation protocol, which enables the user of a workload (program) to independently and securely verify the integrity of the software and hardware stack against predefined trust policies.

In the context of Federated Learning, TEEs can significantly enhance privacy and security by helping ensure that the local computations run by each participant are protected from tampering and unauthorized access. This added layer of security helps maintain the confidentiality of the data and the integrity of the learning process, thereby fostering greater trust among participants and enabling more robust and secure collaborative machine learning models.

In the next sections, we are going to illustrate this approach in practice by providing recipes for securing OpenFL-based federations using Intel® Software Guard Extensions (Intel® SGX).

Securing Data and Computation with Intel SGX

Intel SGX is a set of CPU extensions for creating isolated memory regions, called enclaves, that allow secure computation of sensitive data. Applications that run within enclaves are encrypted in memory and remain isolated from the rest of the system. Only code running within these enclaves can access their memory — the OS, hypervisor, or even privileged users with access to the hardware, cannot.

Running applications within enclaves requires the code to be modified and recompiled to directly access Intel SGX APIs. To reduce this overhead, Gramine* allows unmodified applications to directly run within enclaves, by providing a “hosted” environment with minimal dependencies within the enclave. It acts as an intermediary between the application and the host OS by providing a “microkernel-like” environment for the application.

Executing code within an enclave requires an Intel SGX–supported Intel CPU. In addition, Intel SGX must also be enabled in the BIOS of each participating entity. You may refer to your vendor’s system manual for steps to enable Intel SGX. Note, however, that an Intel SGX–ready system is not a requirement to build enclaves. It is a mandatory requirement when one intends to run these applications within enclaves.

Enabling Intel SGX support for OpenFL code requires building an unmodified application container with all dependencies necessary to run an experiment, whether as an aggregator or a collaborator.

Joel_Dippold_1-1737588227960.png

Figure 2: OpenFL nodes running in Intel SGX TEEs

 

OpenFL supports TEE execution in application containers via the Task Runner API. Once a workspace is ready with PKI certificates, FL experiment plan and source code, call the following command from your workspace:

user@vm:~/example_workspace$ fx workspace dockerize --save

Note that we omit an enclave signing key for demonstration purposes. OpenFL auto-generates a key to sign the enclave. This key is stored in the workspace and is not part of the container image. You can also provide your own enclave signing key via the `--enclave-key` flag.

The workspace image will be saved as a .tar file in the same directory. This image contains the same fx CLI application you used in the previous tutorial, but one that can run within Intel SGX and can be distributed along with respective PKI certificates for a real-world FL experiment between participants. The difference between a TEE and non-TEE execution is small:

 

 

 

 

# With TEE (Gramine and Intel SGX SDK required)

docker run --rm 

  --network host

  --device=/dev/sgx_enclave \

  -v /var/run/aesmd/aesm.socket:/var/run/aesmd/aesm.socket \

  --mount type=bind,source=./certs.tar,target=/certs.tar \

  example_workspace bash -c "gramine-sgx fx ..."

 

 

 

 

 

 

 

 

 

# Without TEE

docker run --rm \

  --network host \

  --mount type=bind,source=./certs.tar,target=/certs.tar \

  example_workspace bash -c "fx ..."

 

 

 

 

While this lowers the entry barrier for confidential computing, establishing trust among participants is important to seal all attack surfaces that originate from hardware or software privileges. The process of establishing trust by proving the legitimacy of hardware and software is called Remote Attestation, which we will explore next.

Establishing End-to-End Trust

Remote Attestation (RA) is the process by which the trustworthiness of the Confidential Computing environment can be independently proven to a relying party (such as a participant in an FL experiment). Remote attestation proves that the intended software runs on real, legitimate hardware in an up-to-date TEE, with computations starting from the expected initial state (including configuration files and input data).

More generally, attestation can be seen as extending trust beyond what is possible with TLS certificates. Remember, a TLS certificate only guarantees that a given service (such as an FL participant node) is hosted by a trusted organization; it does not say anything about the authenticity of the underlying code.

In OpenFL, Remote Attestation can be used by all participants to increase the overall trust in the FL setup:

  • Data Owners can use RA to help prevent data exfiltration or unethical usage by making sure that only the intended and approved ML software is processing their data.
  • Model Builders can use RA to help prevent model theft or poisoning attacks from malicious Data Owners (by verifying that the model is being trained in a valid TEE, with the expected confidentiality properties).

Intel provides Remote Attestation services via Intel® Tiber™ Trust Authority by having TEEs report the state (or a “measurement”) of the underlying computing assets, such as CPU, memory, micro-code and firmware versions. Those measurements are securely sent to the attestation service, which verifies their integrity against known good values (based on Intel-defined policies for trusted configurations). If the measurements match, the service emits a verifiable token that certifies the environment as trustworthy, allowing other systems to rely on this certification.

OpenFL with Remote Attestation

To establish the authenticity of OpenFL nodes operating within Trusted Execution Environments (TEEs), we can envision a process where each participant node (whether an aggregator or a collaborator enclave) interacts with its local Intel SGX environment, and remotely with the Intel Tiber Trust Authority API to generate a cryptographically signed attestation certificate. This process is uniform across all participants, irrespective of their role in the federation:

  1. Upon startup, each OpenFL participant node queries its local Intel SGX Quoting Enclave, which produces a signed “quote” with the enclave's software and hardware measurements.
  2. The Intel SGX quote is then sent to the Intel Tiber Trust Authority API, which verifies the enclave's measurements against the policies defined by the federation (such as supported hardware, firmware versions, trusted ML code measurements, etc.).
  3. If the Intel SGX quote is deemed valid, the Intel Tiber Trust Authority issues a cryptographically signed JWT token, certifying the authenticity of the TEE.
  4. The enclave process then begins and publishes the attestation token for external verification. (The methods for this will be discussed later.)

Joel_Dippold_0-1737754968531.png

Figure 3: Remote Attestation with OpenFL and Intel Tiber Trust Authority

 

An attestation certificate (token) contains signed claims of the trustworthiness of the software and hardware comprising the Intel SGX enclave, but it does not reveal any private data, so it can be freely shared among the federation, and externally if needed. Each token can be independently verified using the public certificate of Intel Tiber Trust Authority. A valid Intel Tiber Trust Authority token helps ensure that the corresponding OpenFL node (enclave) is running the expected code without modifications, on authentic, Intel SGX–capable hardware with valid micro-code and firmware versions. By the properties of Intel SGX enclaves, this also implies that the ML code is executed confidentially, without any person or system (other than the CPU) being able to inspect either the code, the intermediate memory state, or the data in use.

Towards Fully Automated Trust in Federated Learning

Although the process described in the previous section can be used to achieve end-to-end trust within the federation, it involves several manual steps such as sharing and regularly verifying remote attestation certificates (required to account for changing attestation policies). While such a semi-manual process is designed to increase overall trust, it also adds complexity to the FL experiments.

One way to improve scalability would be to automate the verification process, for example by sharing the attestation certificates at a central location, accessible to all relying parties. Another possibility includes extending the OpenFL nodes’ TLS certificates with the claims provided by the attestation service. Thus, attestation verification effectively becomes part of establishing the federation’s trusted communication via an extended form of TLS known as “Attested TLS.”

Trust can be enhanced and automated even further by encrypting the data and only providing the decryption key to a remotely attested collaborator enclave, in a process known as data sealing. This ensures that the data remains secure and can only be accessed by trusted code. By implementing data sealing, we can increasingly protect the integrity and confidentiality of the data throughout its lifecycle, thereby enhancing the overall security of the federation. This approach is not only designed to safeguard the data against unauthorized access but also helps to ensure that it is only processed within a trusted environment, further reinforcing confidentiality and trust within the federation.

Conclusion

Federated Learning represents a significant advancement in the field of Machine Learning, enabling collaborative model training while helping preserve data privacy. However, the inherent trust issues and potential attack vectors in FL necessitate robust security measures. In this blog we have explored the vulnerabilities in traditional FL setups and introduced Trusted Execution Environments (TEEs) as a promising solution to enhance security and privacy. By leveraging Intel Software Guard Extensions (Intel SGX) and Remote Attestation with Intel Tiber Trust Authority, we can help ensure that the computations within FL nodes are more secure and trustworthy, thereby fostering greater confidence among the entities involved in Federated Learning.