Questions about the Digital Random Number Generator (aka Bull M

Gael_H_Intel · ‎04-03-2012

(Posting this for Patrick)

Hi,
The effort involved in the implementation of RdRand instruction to promote its use in security-related applications, in particular the self-testing functionality, is very impressive!
I'm interested in cryptographic applications of the RdRand instruction. Sometimes I might want an entropy source which is unaffected by deterministic processing which could potentially compromise my cryptographic applications. In particular, I am concerned that the Online Health Tests (OHTs) will affect the output of the RdRand instruction. Suppose the entropy source is in fact working properly. The ES will sometimes fail the OHT even though the entropy source is fine. That means that ES bit streams that fail the OHT will never be seen. Hence certain output sequences which would be generated by a perfectly random entropy source will never appear in an output sequence generated by the RdRand instruction. If I understand the documentation, there's a 65536 bit sliding window which is used for the OHT. 256-bit chunks output from the OHT are then used to create up to 1022 64-bit RdRands. It then seems if I only use one out of every 1022*65536/256 = 261,632 values returned by RdRand, each 64-bit sample selected will be outside the sliding window of all the other 64-bit samples selected.
Is that in fact the case? If I use only one out of every 261,632 calls to RdRand, will I in fact get 64-bit samples from non-overlapping sliding windows used by the OHT?
Even when the RdRand outputs I use are from non-overlapping sliding windows, it would still be helpful to know the exact algorithm used by the OHT to see if that might affect my cryptographic applications. Is documentation available for the algorithm used by the OHT? If not, I would like to request that.

Thank you,
Patrick

DAVID_J_Intel · ‎04-03-2012

Patrick. The OHT doesn't work like that. We never throw away entropy bits.

We get false positive failures in about 1% of each 256 bit sample. The tests are biased to reject good samples instead of accepting bad ones.

But when we get a bad sample, we don't throw it away, we stir it into the pool anyway. It's probably good and it couldn't make things worse anyway. However we do not count its entropy. We need a certain number of good samples to be mixed into the entropy pool and we count them. Any bad samples get mixed in as they arrive, but don't get counted.

The effect is that if the quality of the ES output is worse for some reason (say attack or hardware failure) there will be more entropy stirred into the entropy pool to compensate.

The values returned from RdRand are not the health tested samples. The pool, once enough entropy has been stirred in, is used as a seed to seed a cryptographically secure PRNG (DRBG in NIST speak). You see the output of the DRBG. This has a uniform output with an O(2^128) brute force algorithmic attack resistance. Unlike normal software PRNGs, this is reseeded at about 1.5Gbps. So even though it is a PRNG, it is barely deterministic because it is reseeded so often.

The sliding window is a memory of the state of the last 256 samples, each 256 bits. It is used to keep a record of how many of the last 256 samples were healthy. If too few were healthy, we regard the ES as broken and treat all samples as bad until they start passing the OHT above the required threshold rate. this is what we call 'swell' mode or 'unswell' mode depending on the quality of the data. This does not happen in working silicon. I can artifically break the silicon (E.G. run it far outside of the PVT envelope, or zap it with a laser) and watch it happen, but in production parts running within the PVT evelope, that have passes quality test, it doesn't.

At the output of the DRBG, through RdRand, you have no visibility of these processes. We seek to limit the side channels through which an attacker could determine the internal state of the DRNG.

An external independent review of the design has been performed and I believe that this will tell you what you want to know. It should be published soon and I will let you know when it is.

Regards,
David Johnston

pmokeefe · ‎04-18-2012

Thanks David, for setting me straight on that.

I had misinterpreted the following lines from Intel's US Patent Application 20100332574:

[0064] In an embodiment, entropy measurement using per M sample window tests/statistics is performed using a long window M (64K bits=256 samples of 256 bits/samples). The window is dynamically moved back in time (most recent M samples) and continually repeated (one sample after another). The measurement of output goodness is N of M samples "passed" per sample tests, with N chosen empirically from simulations and test chip behavior.

[0065] Based upon configuration parameters of both the per sample expected occurrence rates or distributions of selected bit string patterns and N (from N of M) recent "passing" samples, the number of samples from the entropy source 102 forwarded to the deterministic random bit generator 106 prior to a conditioning operation and (subsequently) a reseeding operation varies based upon the dynamic fit of the entropy source samples to the model's distribution curve.

Somehow I thought those paragraphs were implying that the samples that "didn't pass" were actually rejected and not forwarded to the DRBG.

In any case, further details of the OHT may be important for certain applications. Some of the algorithms that I would like to use in my applications are based on the formal notion of the "min entropy" of the randomness source. I would like to understand the implementation of the RdRand instruction well enough to use it in that context. The applications involve security and privacy, so naturally I'm working very hard to get all the details right.

I'm looking forward to seeing the independent review you mentioned. Hopefully that will answer the rest of my questions!

Cheers,
Patrick O'Keefe

DAVID_J_Intel · ‎04-18-2012

The independent review will answer your questions. I can't answer them too directly because my answers would have to be reviewed before being made public. However I can explain a little about the OHT based on what was given in the IDF presentation.

The OHT is counting the frequency of short bit patterns between 1 and 5 bits long. For completely random data they follow a binomial distribution. For real data, the longer strings are slightly supressed byacontrol loop compared to a binomial distribution. The hardware is looking for a fit to this distribution. The distribution was developed mathematically from the model, and then verified in simulation and on hardware.

The goal is not to determine that the data is random. The goal is to determine that the ES is running correctly. If the ES is running correctly, the math says it's random and meets certain criteria. This is a lot stronger argument than looking at the output and saying 'well it looks random'.

The absolute min entropy requirement is a 50% entropy rate goinginto the conditioner. The engineering target is > 80% (to giveplenty ofmargin) and the measured datais in the upper 90s. The idea being that if we are strictly greater than 50% going into the conditioner, the entropy rate out of the conditioneris (100% - epsilon), where epsilon is negligable. This 100% entropic data is what seeds the DRBG.

The easiest way to model the 'entropy flow' to RdRand is that the conditioner is outputting at a given entropy rate into the DRBG. The DRBG is 'stretching' the data with more samples between reseeds, but the underlying entropy rate in the RdRand output is the same as is coming out of the conditioner. If we take the actual rate and divide it down a lot to be conservative, then we could formally argue that there is strictly greater than 2.8 bits of source entropy per RdRand (regardless of size, 16, 32 or 64), even with knowlege of the internal state of the DRBG (what I call the 'well informed attacker model'). It's actually a lot better than that, but it's harder to show formally. If you want hard entropy (rather than the output of an overengineered PRNG), then cbc-mac together enough RdRands until you've stuffed in 400% entropy relative to the output size of the mac. In the library code we published, we've given example algorithms for doing this.

For practical purposes, this is overkill. Using RdRand output directly is safe and fast. That's the point.

pmokeefe · ‎04-19-2012

Thanks David, that sounds good.

Your remark that there are "greater than 2.8 bits of source entropy per RdRand" is probably very closely related to what I'm looking for, though there are some rather excruciating technical details which still concern me - hopefully I'll have a better handle on those when I see the forthcoming document you mentioned.

pmokeefe · ‎04-29-2012

Hi David,
In an earlier message you said:
"But when we get a bad sample, we don't throw it away, we stir it into the pool anyway. It's probably good and it couldn't make things worse anyway. However we do not count its entropy. We need a certain number of good samples to be mixed into the entropy pool and we count them. Any bad samples get mixed in as they arrive, but don't get counted."

Is there an explicit description somewhere of the procedure/algorithm used to "stir it into the pool" / "any bad samples get mixed in as they arrive".

Will that be detailed in the forthcoming independent review that you mentioned?

If not, I would like to request that.

Thanks, Patrick

DAVID_J_Intel · ‎05-01-2012

I believe the external review will cover this issue.

What happens in the design is this:

There is a 256 entropy pool called CE (Conditioned Entropy). After reseeding, the CE has been 'used' and is marked as such. A counter is reset (counter <= 0).
When a new raw 256 bit raw sample arrives and (counter == 0) the sample is mixed in to the lower half
new_CE[127:0] <= AES-CBC-MAC(k, Raw_sample[255:0] || old_CE[127:0])
if (sample was healthy) counter++
Then when another 256 bit raw sample arrives and (counter == 1) the sample is mixed in to the upper half
new_CE[255:128] <= AES-CBC-MAC(k, Raw_sample[255:0] || old_CE[255:128]
if (sample_was_healthy) counter ++
When (counter == 2) the CE is used to reseed the PRNG.
if (counter == 2) reseed_prng(CE)
counter <= 0

So it repeats conditioning and reseeding continuously at a rate determined by the arrival rate of entropy from the ~2.5Gbps supply of raw entropy bits.

You can see that if a sample was not marked as healthy by the statistical test, it would be mixed in, but the counter would not be incremented, so the next sample would also be mixed in and this would repeat until a healthy sample was received. So to get to (counter == 2), at least two healthly samples must have been mixed in, along with any number of unhealthy samples received in between. The healthy to unhealthy ratio for a working entropy source is about 99:1. I.E. there's a false positive rate of 1%.

You can also see that we do not 'replace' the CE value. We mix the new data in with the old. So even though it should never happen, a transient failure of the ES would not make the CEpool non random. The state in CE and the PRNG would be able to maintain the required security level on the output for 2^40 RdRands (as required by SP800-90). But we put a hard limit in of 2^11, which is obviously more conservative. 2^11 was not chosen to be conservative, it is simplya value that is never reached, so it doesn'tneed to be any bigger. The ES is fast enough toreseed frequently enough that wedon't get above 22 RdRands (~2^5) before a reseed.

So at least 512 bits of raw entropy (which is generally > 95% entropic) is stirred into the data which is used to reseed the PRNG. The PRNG maintains 256 bits of internal state (k and V)and outputs 128 bit values.

We do not control the content of the external review (that's the point of an external review). But I'm pretty sure it will cover this.

pmokeefe · ‎05-02-2012

Thanks David, that's a big help!
One question though. Isn't this in effect using CBC-MAC for variable-length messages?
My concern is that while CBC-MAC is secure for fixed-length messages, by itself, it is purportedly not secure for variable-length messages - see this article in Wikipedia.
Cheers, Patrick

DAVID_J_Intel · ‎05-02-2012

The CBC-MAC length is constant. It is always processing 3 blocks of 128 bits. The lower half of the raw entropy, the upper half and then the previous state of CE.

We invoke AES-CBC-MAC two or more times. There is no padding required, or CPA or CCA.

I'm not sure what is the problem with variable length messages? In protocol design we know how to use it correctly (E.G. see CCM's use of the length fieldin 802.11). There are extention attacks against a misused CBC-MAC, but they do not apply in this instance.

pmokeefe · ‎05-02-2012

Hi David,
I've been trying to research the use of CBC MAC in randomness extraction (which corresponds to the conditioning entropy step, I believe). The most significant reference that I've found so far seems to be:

Dodis, Y., Gennaro, R., Hastad, J., Krawczyk, H., Rabin, T.: Randomness extraction and key derivation using
the CBC, cascade and HMAC modes. In: Advances in Cryptology | CRYPTO 2004 Proceedings. Volume 3152
of Lecture Notes in Computer Science., Springer (2004).

But as far as I can tell, they only covered the case when the number of inputs is fixed, which doesn't seem to be the case in the RdRand implementation. However, the link I included above is apparently not the complete version - I'm still trying to track that down.

That makes me a little concerned, since CBC MAC apparently has a problem when used as a message authentication code with variable length messages (Wikiipedia).
I realize this is a different application, but I'm concerned that those problems with CBC MAC may be relevant here as well.

I found another reference which discusses AES CBC-MAC in particular.
How to Extract and Expand Randomness: A Summary and Explanation of Existing Results
by Yvonne Cliff, Colin Boyd, and Juan Gonzalez Nieto.

They state:

Dodis et al. [3] have analysed the security of CBC-MAC, the cascade construction and HMAC, and were the first to do so in the standard model. Their results for CBC-MAC are of little practical use, since a security level of only e = 62 bits can be achieved using existing AES block ciphers, which all have a block size of 128 bits. To achieve a security level of e = 82 bits for the extractor, a block size of at least 169 bits would be required.

That seems to indicate that AES CBC MAC has somewhat limited effectiveness, even in the case where there's a fixed number of inputs.

My motivation in getting into so much detail is due to the cryptological applications I'm working on. The analysis required to do that properly can be somewhat delicate. For example see On the (non)Universality of the One-Time Pad by Yevgeniy Dodis and Joel Spencer.

Hence I really appreciate your patience in helping me out with some of these details. I believe there are probably many other practioners out there with similiar applications and concerns, so hopefully the trouble you're taking with me on this will help them out as well!

It's certainly possible that I'm misinterpreting something you wrote, or that references that I cited are not actually applicable in this case, if so I'm looking forward to being set straight on that!

Thanks again,

Patrick

pmokeefe · ‎05-02-2012

Hi David,
I didn't see your reply before submitting another post, so they actually crossed, sorry about that.
I'm preparing a response to your latest message, in the meantime I didn't want you to think I was ignoring it!
Cheers, Patrick

pmokeefe · ‎05-02-2012

Hi David,
First, I'd like to make sure I completely understand what you wrote.

new_CE[127:0] <= AES-CBC-MAC(k, Raw_sample[255:0] || old_CE[127:0])

Does that expand out as
AES128(k,
AES128(k, AES128(k,Raw_sample[127:0]) XOR Raw_sample[255:128])
XOR
old_CE[127:0])
?

That's my guess based on the Wikipedia article CBC MAC, but perhaps you meant something different?

The next question is about code as a whole. Is this equivalent to the way the CE is computed for a reseeding?

counter <= 0
while (counter < 1) {
CE[127:0] <= AES-CBC-MAC(k, Raw_sample[255:0] || CE[127:0])
if (sample was healthy) counter++
}
counter <= 0
while (counter < 1) {
CE[255:128] <= AES-CBC-MAC(k, Raw_sample[255:0] || CE[255:128]
if (sample_was_healthy) counter ++
}
reseed_prng(CE)

where every instance of Raw_sample[255:0] in each iteration will be a different raw entropy sample.

The other possibility I can think of is this:
low_counter <= 0
high_counter <= 0
while ((low_counter < 1) OR (high_counter < 1)) {
if (low_counter < 1) {
CE[127:0] <= AES-CBC-MAC(k, Raw_sample[255:0] || CE[127:0])
if (sample was healthy) low_counter++
}
}
if (higher_counter < 1) {
CE[255:128] <= AES-CBC-MAC(k, Raw_sample[255:0] || CE[255:128]
if (sample_was_healthy) high_counter ++
}
}
reseed_prng(CE)

again every instance of Raw_sample[255:0] in each iteration will be a different raw entropy sample.

The two versions are in some sense functionally equivalent. Is either of them actually what's happening?

I'm assuming that both CE[127:0] and CE[255:128] are supposed to be based on exactly one healthy 256 bit raw sample (different for each), but they may also have made use of a variable number of unhealthy samples as well.
Furthermore each time we generate a new conditioned entropy sample, we make use of the previously generated conditioned entropy sample - CE is not reinitialized each time we generated a new conditioned entropy sample?

Is all that correct?

Thanks, Patrick

pmokeefe · ‎05-02-2012

Hi David,
I think I can zero in on my major concerns without going into so much detail (though the details are also important!)
You wrote:
new_CE[127:0] <= AES-CBC-MAC(k, Raw_sample[255:0] || old_CE[127:0])

The first concern is that old_CE[127:0] is not itself a raw entropy sample. My understanding is that that there are some security guarentees when the input to CBC-MAC consists entirely of a fixed number of raw entropy samples - but that's not what we appear to have here.

The second issue applies to the use of CBC-MAC even with a fixed number of raw entropy samples. According to How to Extract and Expand Randomness: A Summary and Explanation of Existing Results
by Yvonne Cliff, Colin Boyd, and Juan Gonzalez Nieto:

"Dodis et al. [3] have analysed the security of CBC-MAC, the cascade construction and HMAC, and were the first to do so in the standard model. Their results for CBC-MAC are of little practical use, since a security level of only e = 62 bits can be achieved using existing AES block ciphers, which all have a block size of 128 bits. To achieve a security level of e = 82 bits for the extractor, a block size of at least 169 bits would be required."

They seem to be saying that AES-CBC-MAC with a block size of 128 bits only offers a security level of 62 bits.

Though I believe this message covers my fundamental concerns, I'm still very interested in the rest of the details of RdRand. Putting RdRand to work in cryptological applications is actually the main focus of my work right now. I'm absolutely relying on my understanding of the RdRand implementation to deliver results. Hopefully that will motivate the customers to go out and buy a lot of the new Intel chips which implement RdRand - instead of just sitting on their old hardware because "it's still good enough"!

Cheers, Patrick

Questions about the Digital Random Number Generator (aka Bull Mountain)