We had a game of monopoly at the weekend and someone had lost the dice, so I used the dice at
https://random.bentasker.co.uk
However, I got doubles
a lot (which was handy as I kept landing on "Go to jail").
The probability of rolling doubles when rolling two dice is 0.16 so it may be nothing, but it felt oddly frequent (it also only seemed to be me which lent a feeling that they might be coming in some kind of cycle).
I thought it might be interesting to dig into as it's a while since I've touched that codebase.
Random output is generated by a Hash-chains based RNG I created in LUA (originally, but no longer, reading in seeds from my ChaCha20 RNG -
https://www.bentasker.co.uk/blog/software-development/689-writing-a-chacha20-based-csprng)
Activity
2021-06-03 18:22:02
Rolling 2 dice twenty times
Does seem more common than might be expected.
Can we get triples?
No, but there is a sequence in there if we squash rolls together
Could be chance though.
Another run yields two identical rolls one after the other
Another run gets us triples
Although unlikely, it is possible.
It's gut feel, but I feel like something might be going on - because we're fetching over a network (and other stuff may also be fetching) there's a hell of a lot we can't control in order to test this.
2021-06-03 18:46:37
That last output is written into
Variable
Note:
I'll KISS for now and not delve too far into that.
The table we populated in our iteration is concat'd to stretch 480 bytes creating the string
In theory the iteration and concat process should prevent repeats (because we're mixing in random data) - again, I'll KISS for now and move on.
We call
Type
There's nothing immensely special about the
It reads the digest, regexing out alphanumeric chars and punctuation, then converting that char to it's ascii char code and inserting.
It's possible the fairly limited range of char codes (0-127) could lead to certain integers being more common - if that's the case then you'd expect to see a heavier weighting on those numbers:
- 1 is probably an obvious example, because every ASCII character code > 100 has a 1 in it
- 2 is also likely to be more common, because char codes get up to 127 - it has 8 more entries (120-127) than the numbers 3-0
That's something we should be able to check for reasonably easily - can come back to this.
Getting back to the processing flow, the response from
The response then processed within the
For reasons I can't quite remember, the string is read backwards.
As soon as we find a number that could be on a dice, we return it to give the value of a single dice.
That's then sent to the user with
Finally, the value is written to SHM for use by the next request (mixing with some random bytes to provide some back-tracking protectio)
There's a potential race there then that another request could come in (potentially from the same user) between sending them the response and writing into SHM
The page at random.bentasker.co.uk places two requests to https://random.bentasker.co.uk/dice giving us two values
2021-06-03 18:51:20
- Any possibility for a race fetching last roll from the SHM should be removed by the mixing/stretching done at the beginning
- That mixing should also prevent erroneous doubles being served up if 2 workers simultaneously process requests
- The mixing should also deal with the race where a result isn't written to SHM until after it's been returned (though that should be easy to fix)
- Digits selection may introduce bias towards the numbers
Will look at this more tomorrow
2021-06-04 10:34:32
We'll roll 100 dice and then check distribution of the results - the expectation (if the above is true) being that
The distribution of most digits fluctuates between tests (as might be expected of a random result set) - it feels like 6 is disproportionately common, but will examine that in a bit.
Range of variation is
-
-
-
-
-
-
Presumably,
Which gives
-
-
-
-
-
-
Good.
OK, let's look at how often each number appeared - each set of 100 is already a percentage, so we'll work out:
- the average for each result in a set of 500 rolls
- the average of those averages (to roughly appropriate the chance of getting any given number)
The expectation is that the first will vary between the two test sets, the latter should sit at roughly 1/6th (16.6667%) - if it doesn't then I've screwed the maths somewhere (because the sum of the first set should always be 100) so it's more a check than a test.
Test run 1:
-
-
-
-
-
-
- Aggregate: 16.6667
Test run 2:
-
-
-
-
-
-
- Aggregate: 16.6667
Side note:
This doesn't appear to be the case
Can't really conclude from just 2 tests that it's truly random, but from the distribution of results the signs are there
- The %age showing of any result differs between each test run
- Groups of tests show different average showings per result
- A bias towards
2021-06-04 12:01:12
Verifying we've a full set
What does
So
- Entropy per char is low (not a good sign)
- Chi Square is < 1%, suggesting numbers aren't random
- Arithmetic mean value is quite a long way from 127.5, but then our range of available values is much smaller than in "normal" random data. Values available to us are 40-54, so 51.5 is smack in the middle, this is a good sign
- Monte Carlo isn't great (we're aiming for Pi there really)
- The serial correlation coefficient looks very good
The negative signs there, though, are probably a result of using the wrong tool for the job -
If we test with
Wrong tools for the job really.
From the
So, in that set,
Lets run another set of rolls and see how the distribution shifts there
Observations
-
- Arithmetic mean falls in the middle of our range again (this is good)
- Serial correlation looks very good
Running a third set of rolls to look at the incidence of
2021-06-04 12:06:59
IOW, let's re-test this
Wrote a quick LUA script to wrap the logic used by the dice roll generation
Running
So I've not gone nuts, there should be no bias towards
To put it another way, if we have
- 126
- 134
- 51
The number
- 621
- 431
- 15
Yet,
It's not a product of
It's going to take quite a while to complete, but lets set a series of rolls going
We can then see if that tendancy towards 1 continues
2021-06-04 14:07:24
We can write the digest to SHM sooner (so that the next request doesn't re-use the one we've just used), but there are a couple of options here
- Do we do it immediately?
- Do we wait until we've performed our calculations, but before sending back to client
Neither is perfect. If we write it out immediately, then there's potential for a different race:
In the flow above, the digest that Client b gets would be derived from the digest that Client a's response is derived from. It won't be directly the same, because of our earlier mixing, but as they're psuedo-random bytes there's still potential for a fuck-up.
On the other hand, if we wait until we've performed our calculations, then the race we have is the same as now - it's just that the window is shortened by not having to wait until the response is sent to client.
The difference between the two though, is that in the first, Client B could conceivably get random bytes before they've been served to Client A - he'd have preknowledge of what's going out (assuming he could overcome the mixing).
I guess the truth of the matter is that we have to trust the mixing rounds - there's one when writing the value to SHM, and the 3 after loading it.
So actually, the race's have different levels of mixing
- Current: 3 rounds (load time)
- Preknowledge: 1 round (when writing to SHM) + 3 rounds (load time)
Although the storage mix wasn't added for that reason:
It would seem better to have the protection of the additional round.
2021-06-04 14:11:53
View Commit | View Changes
2021-06-04 14:16:16
The Dice at https://random.bentasker.co.uk are populated by placing 2 XHR requests to
If we look at the response headers
There is no
Devtools says it's not cached though, and there are 2 requests showing in the nginx access log, so it's not that.
All the same, it'd probably be wise to stick a cache-control header in to make sure we're being explicit (other browsers might behave differently to Firefox).
2021-06-04 14:21:52
View Commit | View Changes
2021-06-04 14:52:10
Lets start by looking at the distribution of the number
That's... odd.
Putting them all together:
There does seem to be a consistent bias towards
So, I think we need to dig a bit deeper than the high level KISS skim we've done so far and look a little closer at our source (
2021-06-04 15:39:59
- Generate a HMAC (from several rounds of mixing)
- Regex printable chars out of it
- Convert those chars into ASCII character codes (to get a string of integers)
- Select an integer from that string
We've ascertained (I think) that there's no race - the rounds of mixing are effective.
But, is there perhaps an introduced bias from the conversion process? For any given random string, there must be a fairly limited number of ascii printable chars
There are a few there where we don't get any at all. Probably wise to start at looking how we handle that?
We should get an empty value back. But, despite quite easily failing to find matches with
Ah, that's my fault -
Have added some debug headers so that we can look at what digits are being fed into the dice selection, ignoring output unless I got a
There is actually a slight potential for bias towards
There are three ways a result can get selected
- They're in the units column (i.e. the last number in the digit string prior to reversing)
- They're in the tens column (i.e. the penultimate number before reversing) and the number in units is >= 7 or is 0
- They're in the hundreds column, and the numbers in both tens and units are >= 7 or 0
Which gives a couple of observations
- results >= 3 lose the advantage of the second selection mechanism after we've reached 100 (because the pool of integers is 1 - 127)
- Only result
Taking
Whereas
Number
In total, there should be 111 entries - there are < 127 because there are certain numbers that can't be used at all (0, 7, 8, 9, 70, 77, 78, 79, 80, 87, 88, 89, 90, 97, 98, 99)
Those entries (seperated by result) are
So we can now work out the percentage chance of a number being selected vs any other result
-
-
-
-
-
-
Bias doesn't apply linearly, so the results we were getting earlier now make a lot more sense (although
We now know why I got snake-eyes so frequently in our monopoly game.
Next thing is working out how to fix it.
2021-06-04 15:56:17
There are two obvious options
- Shuffle/randomise the digit string rather than simply reversing it (effectively picking a random position in the digit string)
- When selecting digits, constrain to char codes 1-96 (which should give an equal number of entries to each)
The problem with shuffling is we probably actually worsen the bias towards
Currently there are mitigations (reversing the string) in place to deal with it's advantage of appearing in the 100s column, doing shuffle/random select would undermine that, giving
So, in addition to it's current entries
It's a (slightly) bigger code change, but constraining the digit string to values between 1 and 96 should be a better fit. It'd constrain the entries to
It would however increase the chance of getting an empty digit string back (so I need to look closer at how we handle that too)
2021-06-04 16:11:52
View Commit | View Changes
2021-06-04 16:15:52
View Commit | View Changes
2021-06-04 16:21:53
View Commit | View Changes
2021-06-04 16:51:52
View Commit | View Changes
2021-06-04 17:06:26
That'll be courtesy of the pattern match
Matches alphanumerics and punctuation. But, the lower end of the ASCII table (0-32) is comprised of chars that won't match.
Turns out we needed the hundreds to give
So, we need to redefine the rules a bit:
- Char code should be between 32 (the first punctuation char) and 126 (the last)
- Unusable codes should be omitted (to avoid bias toward
So, we'd need to omit 107,108,109 - that'd still leave us imbalanceed though (
Giving
-
-
-
-
-
-
Exclusion list is therefore 40,50,60,107,108,109.
Rather than a bunch of
Ah, but, if we're reintroducing hundereds, we also need to reintroduce reversing the string otherwise
2021-06-04 20:15:17
View Commit | View Changes
2021-06-04 20:16:13
2021-06-05 10:57:40
That's much better, but it still feels like there might be a bias there, was
Ah, maybe it's that our exclusion list isn't quite comprehensive enough -
-
-
-
-
-
-
Lets adjust so that numbers ending in 7,8,9 are excluded - that brings us back to 10 entries per result.
Distribution between 5 sets of rolls looks much better
2021-06-05 11:01:52
View Commit | View Changes
2021-06-05 11:01:53
View Commit | View Changes
2021-06-05 11:19:29
So, as this ticket's now quite long, I'll attempt to summarise
- There are possible race conditions stemming from the way hashes in the chain are passed between requests
- all appear to be well mitigated by the mixing rounds
- There was a bias towards result
- Result
- This was sufficient to gain an advantage, but not enough that it's immediately obvious in a small number of rolls
- Low levels of bias require large datasets to spot, so testing for them is hard and time consuming
- There was no bias towards doubles in general, but the bias towards
- Human memory sucks: I was aware I'd got a lot of doubles, but statistically it's likely that a lot of them were snake eyes - something I hadn't consciously registered (too excited at escaping jail I guess)
- Human perception of randomness really sucks - we often see randomness where there's bias, and bias in properly random sets
On that last point, I've written in the past about the difficulties involved in assessing randomness (https://www.bentasker.co.uk/documentation/security/287-understanding-the-difficulty-of-assessing-entropy), so it's not solely a human problem - machines suck at it too.
There is a certain irony in the dice having been biased. The codebase was originally created to consume seeds from a ChaCha20 generator I wrote and backdoored (https://www.bentasker.co.uk/blog/software-development/689-writing-a-chacha20-based-csprng), so there was an early intention to make inputs predictable.
It, along with some of my hamfisted fixes above, show just how easy it is to accidentally introduce a new bias - if it's a subtle enough advantage you may not register it until you're looking at large datasets. Even then, there's a lot of cross-checking and doubt involved because the brain will tend to see patterns where none exist, and miss patterns that do exist.
Abusing this
It'd be remiss of me not to look at how we weaponise this
That we can so easily tilt rolls in favour of
If we invert the range (so 1 becomes
I've added a new "Rigged Dice" option to https://random.bentasker.co.uk/ - it should show a subtle bias towards
2021-06-05 11:31:53
View Commit | View Changes
2021-06-05 11:37:07
2021-06-05 11:37:07
2021-06-05 11:37:12