Chips & Salsa: Industry Collaboration for new Hardware CWEs

IPAS_Security · ‎02-29-2024

Blog by Scott Constable with contributions from the IPAS security team

About the author: Scott is a security researcher in Intel Labs. He received his PhD in computer science from Syracuse University in 2018. Scott’s current research covers instruction set architecture security and transient execution attack mitigation. He recently worked on the Asynchronous Enclave Exit Notify (AEX-Notify) extension to Intel(R) Software Guard Extensions (Intel(R) SGX) and published a related paper in USENIX Security '23.

Today Intel is pleased to announce the introduction of four new transient execution weaknesses into the Common Weakness Enumeration (CWE) standard. This announcement is the culmination of a two-year collaborative effort initiated by Intel, with additional contributions from industry participants (AMD, Arm, Cycuity, and Riscure) and the MITRE corporation.

Intel is a strong advocate of product security and we take pride in sharing industry best practices with the community. Intel partnered with MITRE to introduce Hardware CWE four years ago and has been a catalyst in driving improvements, including today's announcement.

Transient execution weaknesses can be difficult for hardware designers to diagnose and prevent, as they are often introduced by a combination of seemingly unrelated hardware optimizations (for example, the original Spectre exploit combined branch prediction and data caching). The new CWE entries systematically describe the conditions in microprocessors that can contribute to the introduction of these weaknesses, and the mitigations and detection methods that can be used to address them:

CWE-1421 describes weaknesses where transient execution can allow a malicious actor to access architecturally restricted data. Example: This type of weakness might allow a user-mode thread to access kernel data during transient execution.
CWE-1422 describes weaknesses that arise when a processor allows stale or incorrect data to be forwarded from one operation to another during transient execution. Example: This type of weakness might allow a malicious thread to trigger the processor to mis-predict that sensitive data can be forwarded to a memory address operation, potentially allowing a malicious actor to infer the sensitive data through a cache covert channel.
CWE-1423 describes weaknesses where shared microarchitectural predictor state can allow a malicious actor in one processor context to influence predictions that occur in another context. Example: This type of weakness might allow a malicious SMT thread to influence its sibling SMT thread’s transient execution by maliciously training shared branch predictors.
CWE-1420 encompasses all weaknesses that arise from transient execution and is the “parent” of CWE-1421, CWE-1422, and CWE-1423.

This blog post summarizes Intel’s involvement in the joint effort, as well as the contents of the new weakness descriptions and the impact we expect them to have on the broader microprocessor industry. We also invite the reader to listen to the latest episode of the Chips & Salsa podcast (link above) to hear from several of the industry and CWE community participants who helped to make today’s announcement possible.

In this Chips & Salsa episode, SIG members from Intel, MITRE, AMD, Cycuity, and Riscure, tell us why these new CWEs are important and talk about the challenges and rewards of this industry collaboration:

What motivated the new weakness descriptions?

The new descriptions are motivated by the security implications of a commonly used optimization technique. Modern processors sometimes discard the architectural (software-observable) effects of recently executed instructions; for example, after a processor discovers it has mis-predicted a branch target. This phenomenon, commonly known as transient execution, can potentially be misused by a malicious adversary to infer sensitive data when the discarded instructions leave microarchitectural side effects that can later be detected, for example, by using a cache covert channel [1].

Whenever Intel—or any other processor vendor—discovers a transient execution vulnerability in one of its products, it may publicly disclose the vulnerability with a CVE. Moreover, the CVE should map to a CWE entry and use its description as a template for the CVE’s description. Prior to today’s announcement, there were already a few CWEs that Intel and other processor vendors would sometimes use to describe transient execution vulnerabilities:

CWE	Title	Description
1037	Processor Optimization Removal or Modification of Security-critical Code	The developer builds a security-critical protection mechanism into the software, but the processor optimizes the execution of the program such that the mechanism is removed or modified.
1264	Hardware Logic with Insecure De-Synchronization between Control and Data Channels	The hardware logic for error handling and security checks can incorrectly forward data before the security check is complete.
1303	Non-Transparent Sharing of Microarchitectural Resources	Hardware resources shared across execution contexts (e.g., caches and branch predictors) can violate the expected architecture isolation between contexts.
1342	Information Exposure through Microarchitectural State after Transient Execution	The processor does not properly clear microarchitectural state after incorrect microcode assists or speculative execution, resulting in transient execution.

In our experience with vulnerability disclosures, we noticed that these CWEs had several shortcomings that limited their applicability to many transient execution vulnerabilities. For example, consider CVE-2017-5753 (a.k.a. Spectre, Bounds Check Bypass, or BCB), one of the first transient execution vulnerabilities to be discovered. None of the existing CWEs could fully characterize this vulnerability:

Although BCB can affect conditional branches that do serve as "security-critical protection mechanisms" (for example, in software sandboxes), BCB can also affect branches that serve a different purpose, such as dynamic type checking [2]. Therefore CWE-1037 is too narrow.
The conditional branch instructions affected by BCB are not "hardware logic for error handling and security checks." Hence CWE-1264 does not apply.
BCB can be exploited over a network [3] without shared resources, and therefore CWE-1303 does not apply.
CWE-1342 implies that the processor should be clearing microarchitectural state, which is impractical or infeasible for many transient execution vulnerabilities, including BCB.

At the end of 2021, Intel submitted a proposal to MITRE to add a single new CWE to fill some of the gaps in the existing CWEs. At a HW CWE SIG meeting in September 2022, MITRE asked us to expand the proposal into multiple CWEs that would supersede and deprecate some of those existing CWEs. This became a much larger task, as it required us to re-imagine what a new hierarchy of CWEs would look like. How would it be structured? How could we delineate between different kinds of weaknesses? What are the root causes? What are the commonalities? It was difficult to paint a complete picture by looking only at Intel’s CVEs because we knew that there were some other kinds of weaknesses that we had not encountered on our processors. This is where the CWE community made critical contributions. The other SIG members were able to provide concrete examples of corner cases that we had not considered in our proposal, and they helped us to refine the new CWE definitions to cover these corner cases.

What are the new weakness descriptions?

The table below lists the four new weakness descriptions, which are distinguished by their root cause, or “condition” in the CWE lexicon. Intel and fellow industry collaborators identified three primary root causes that led to a majority of the observed transient execution weaknesses. First, a race condition between a data access and an access control check on that data can potentially expose the data to a thread that should not be permitted to observe it; this is the root cause of CWE-1421. Data forwarding, the subject of CWE-1422, is a common optimization technique that opportunistically bypasses slow pipeline stages (for example, memory load/store) and has contributed to several observed transient execution weaknesses. CWE-1423, the third root cause and one that has repeatedly attracted attention in the industry, can allow a thread to maliciously train predictors in a manner that influences another thread’s behavior during transient execution, if the predictors are shared between the two threads.

The underlined phrases in each description in the table below can be replaced or embellished when issuing a CVE. The next section works through an example.

CWE	Title	Description
1420	Exposure of Sensitive Information during Transient Execution	A processor event or prediction may allow incorrect operations (or correct operations with incorrect data) to execute transiently, potentially exposing data over a covert channel.
1421	Exposure of Sensitive Information in Shared Microarchitectural Resource during Transient Execution	A processor event may allow transient operations to access architecturally restricted data (for example, in another address space) in a shared microarchitectural resource (for example, a CPU cache), potentially exposing the data over a covert channel.
1422	Exposure of Sensitive Information caused by Incorrect Data Forwarding during Transient Execution	A processor event or prediction may allow incorrect or stale data to be forwarded to transient operations, potentially exposing data over a covert channel.
1423	Exposure of Sensitive Information caused by Shared Microarchitectural Predictor State that influences Transient Execution	Shared microarchitectural predictor state may allow code to influence transient execution across a hardware boundary, potentially exposing data that is accessible beyond the boundary over a covert channel.

How will these new CWEs benefit the industry?

There are at least two obvious ways that the new CWEs will benefit the industry. First, we expect them to allow hardware vendors to issue more comprehensible CVEs. Here is an example. When Intel disclosed CVE-2021-0089 (a.k.a. Speculative Code Store Bypass), none of the existing CWEs was a close match for the issue. Therefore we had to adopt language from the more generic CWE-204: Observable Response Discrepancy, which is an abstract description of a side channel. Consequently CVE-2021-0089’s description became: “Observable response discrepancy in some Intel® Processors may allow an authorized user to potentially enable information disclosure via local access.” Such vague descriptions may make inferring the root cause of the vulnerability or its potential impacts challenging for industry partners, software vendors, and system administrators .

Although CVE-2021-0089 doesn’t involve one of the three main root causes described above, it can still be accurately summarized by adopting CWE-1420’s description: “A machine clear triggered by self-modifying code may allow incorrect operations to execute transiently, potentially exposing data over a covert channel.” We can derive this more precise description by treating CWE-1420’s description as a template: we replaced “processor event or prediction” with “machine clear triggered by self-modifying code,” i.e., the root cause; we specified that “incorrect operations” can be executed transiently; and there isn’t any specific type of data that can be exposed, so “data” didn’t require any embellishment.

The second obvious benefit of the CWE entries is that they provide a ton of additional information about transient execution weaknesses, including:

Modes of introduction – At what point in hardware design, system configuration, or software development (etc.) can these weaknesses potentially be introduced?
Potential mitigations – How can transient execution weaknesses be mitigated during hardware design and, for those that can’t, how can they be mitigated by software techniques? The CWEs provide more than a dozen best-known methods, many of which are currently being used in the hardware industry and by software vendors.
Detection methods – How can transient execution weaknesses be detected in hardware designs, in post-silicon hardware samples, or in software programs?
Demonstrative examples – These brief expository examples are derived from real CVEs.
Additional general information about each weakness and how certain hardware behaviors can contribute to the introduction of vulnerabilities.

What is CWE and how does it relate to CVE?

First, here is a quick primer on Common Vulnerability Enumeration (CVE) for readers who are unfamiliar: whenever a company or organization discovers a potential vulnerability in one of its products—it could be a hardware or software product—a common best practice is to triage the vulnerability and then disclose a CVE if mitigations may be required. Each CVE has a unique identifier of the form “CVE-YYYY-NNNN,” where “YYYY” is the year in which the CVE was assigned and “NNNN” is the enumeration that makes the identifier unique. For example, the first ever CVE was assigned in 1999 and its identifier is CVE-1999-0001. Then the second CVE assigned that year would have been CVE-1999-0002, etc. Each CVE also has an associated description, usually a single sentence, that summarizes the vulnerability and its potential impacts. The description of CVE-1999-0001 reads, “ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets.” CVE has been widely adopted by the computing industry, and over the past 25 years of its existence nearly 225,000 vulnerabilities have been added to the public CVE database. The CVE standard and the CVE database are both maintained by the MITRE corporation.

As the scope of vulnerability disclosures expanded into the 21st century, MITRE introduced a new Common Weakness Enumeration (CWE) standard to document the common root causes (or “weaknesses”) that can contribute to the introduction of vulnerabilities. Each new CVE “maps” to the CWE that most accurately characterizes the vulnerability’s root cause. For example, many CVEs that involve an out-of-bounds write vulnerability can map to CWE-787, whose description reads: “The product writes data past the end, or before the beginning, of the intended buffer.” The computing industry’s adoption of CWE has brought several benefits:

CWE allows MITRE to track the frequency and severity of all vulnerabilities reported for each weakness type. For instance, every year MITRE compiles this data to publish a “Top 25 Most Dangerous Software Weaknesses” list (in 2023’s list, out-of-bounds write made #1).
CWE is an information and educational tool. Each CWE entry contains general information about the weakness, including common modes of introduction, best-known mitigations, detection methods, and expository examples.
CWE has influenced the design of many software and hardware analysis tools. For example, CWE forms the organizational structure of many of CodeQL’s software security analysis queries. Academic researchers have also demonstrated that large language models that are trained on CWE entries can find and fix certain security vulnerabilities in hardware specifications.
CWE provides companies and organizations with a common language that can be used to issue CVEs. In other words, a CWE description serves as a kind of template for writing CVE descriptions that map to the CWE.

References

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/best-practices/refined-speculative-execution-terminology.html

[2] O. Kirzner and A. Morrison, "An Analysis of Speculative Type Confusion Vulnerabilities in the Wild," in 30th USENIX Security Symposium (USENIX Security 21), 2021.

[3] M. Schwarz, M. Schwarzl, M. Lipp, J. Masters and D. Gruss, "NetSpectre: Read Arbitrary Memory over Network," in ESORICS 2019: 24th European Symposium on Research in Computer Security, Luxembourg, Luxembourg, 2019.