How To Generate Truly Random Bits
Good random numbers are surprisingle hard to obtain, especially if they have to account for cryptographic demands.
On this page we explain how to generate random numbers in an average home situation.
OpenFortress provides a free online service from which you can download entropy generated according to these principles.
The primary audience for this document is Unix users (and that includes MacOS X users).
To achieve an optimum of accuracy and portability, the examples are given from a command line.
Properties of good random numbers
The requirements of a random bit sequence are just few, but they turn out rather hard to do well:
- Fairness: the chance of a 1 bit is 50%, the chance of a 0 bit is 50%,
- High entropy: no patterns between the bits are allowed,
- Non-replay: nobody should be able to reconstruct a bit sequence at a later time.
Many things that come up when you think of generating random numbers fail on at least one of those requirements:
- Cats walking on keyboards generate unpredictable patterns (although it would be in their nature to dance a cha-cha when you least want them to) but entropy is not optimal.
There are patterns due to the order of lifting legs, and there is a range of possible keys hit by a paw caused by its limited length.
A reliable source also informed me that cats have a more-than-average tendency to step on the power-button, followed by the enter key to establish a system shutdown.
- Flipping a coin yields high entropy, replay is hard (if no notes of the sequence are kept) but the fairness is not guaranteed.
Some care must be taken to get rid of any bias, possibly like we do below.
- The output from /dev/random devices is usually based on hard disk seek timing and such.
That is nice, but not short after a system boot, because during that time the system behaves fairly predictably.
- Pseudo-random number generators internally store a `seed' which is used to keep the sequence of numbers going.
The output of these generators is cyclic, with a maximum cycle length determined by the number of different seed values.
The use of prime numbers in these algorithms is to achieve the maximum, but it cannot advance beyond it.
Prime numbers are in the end just numbers, not ultimate magic.
- CD-ROMs sold online or otherwise, and filled with random numbers, fail on account of their reproducibility.
Some online services provide random numbers based on a diversity of principles.
These are definitely worth checking out, but all of them suffer from the uncertainty on the issue of replay.
- random.org contains a lot of information about random numbers.
Their approach is based on samples of radio noise, which is filtered to ensure entropy and fairness.
- HotBits at Fourmi Labs is an intruiging story of someone who had a source of radio-activity lying around and used the random isotopic decay intervals to generate random bits.
They filter explicitly to remove most bias and thereby achieve a pragmatic level of fairness.
You never know if the providers of these services will actually discard the output of their processes, or whether they will resend it to others.
If you are truly paranoid, you probably want to use one of these principles, but under your own control.
This is precisely what follows below -- and since most people won't have a source of radio-activity lying around, the radio approach will be followed.
How to set up your hardware
Do you have a radio?
Tune it in between stations.
Music is highly reproducible, and although DJ's may blabber at random they may be recorded (so reproducible) and any radio station in general is duplicated with care in the whole area of broadcast.
Do not use the FM tuner of your radio, because FM is based on a beacon signal with the sounds wrapped around it -- in other words, the radio can recognise absence of a station and refuse to tune in, or adapt tuning to a nearby station.
You want to use the AM tuner, which does not have these problems.
Is it a problem if there is a recognisable sound mixed with the noise?
Not really, but you probably want the noise to be the loudest.
Since the two signals are added, the total is unpredictable anyway.
And we will sample only a single bit at a time, to avoid any effects that the slopes and such of the recognisable sound may have.
Should you worry if Fourier analysis shows one, or a few, peaks in the signal?
Not really. For the same reason.
Should you worry if you distinctly hear a tone in the signal?
That's the same question as the previous, only worded in more popular terms.
So, no need to worry about that either.
Connect your radio's output to the input of the sound device on your computer, making sure that the signal levels match properly.
Sample the signal for some time and see if it occupies a sufficiently large range of sample values, say 8 bits or more (assuming 16 bit sampling).
Make sure that the sound does not `clip'.
This happens if it exceeds the range of the sound device, and it usually results in excessive amounts of maximum and minimum valued samples.
In a graphical display, you would see horizontal lines for clipping.
Do not use a radio signal that was spread in digital form (because it is simple to reproduce) and also do not use a CD as a binary or analog data stream (for the same reason).
Finally, you would do best to disconnect your computer from the Internet from
This means physically unplugging it, not just bringing down the network
And if you have a battery-powered machine, why not disconnect it from its wall
plug, just to shield off another possible channel of information leaking?
Stay disconnected until the random numbers have been thoroughly removed from
All this is intended to reduce the risk of replay of your random numbers.
Sampling the truly random bits
Now sample your input signal.
Use any program you like for that, as long as it does not display the random bits in any form, either graphical or numerical.
The more bits you collect, the better it is, usually.
To generate symmetric keys and DSA keys, you need less than to generate RSA keys.
Longer keys take (a lot) more effort to find, and so a lot more bits are useful.
I found a nice program named
rec on Linux to sample my sound device.
It was more flexible than taking the sound device's character stream directly, because this little command allows influencing of sample rate and such.
rec -c 1 -d /dev/dsp -r 8000 -t raw -s w -
will do nicely -- sampling at 8kHz, where every audio system should produce sufficient sound variations, sampling mono (stereo would add a second signal, but it would depend on the first), and output as 16-bit words.
^C to stop recording bits (otherwise it goes on forever).
Or something in that vicinity -- but this is what random.org used also.
Not all these bits are usable.
Notably, the higher bits tend to change at a slower pace because they reveal a lot of the ground tones of your noise.
Better pick a pretty-low bit and leave it to that.
To evade most bias (that would be easiest noticed in bit.0) you could prefer bit.1 over bit.0 however.
Even if the choice for bit.1 probably evades a lot of bias, it is still necessary to compensate for any lack of fairness.
So, you take two samples at a time, and compare them.
Depending on the result, your discard it, or generate a fair bit:
Do not reuse the samples if the result was ignored -- this is almost certain to
lead to alternating sequences like 010101010101... which fails to meet the requirement of high entropy.
|Sample #1||Sample #2||Result|
I've written a small C program to perform this task.
Of course there is a signature on this file.
It functions as a filter, meaning that it reads from the standard input and writes to the standard output.
Oh, and it assumes that the input comprises of a mono .wav file with 16 bit samples, so the header is ignored and the samples are read 16 bits at a time.
You compile a file like this with the
gcc C-compiler with
gcc -o noise-filter noise-filter.c
which creates an executable file named
noise-filter that you can now use (as
./noise-filter if the current directory is not in your path).
The combination of the latter two commands output what you need:
rec -c 1 -d /dev/dsp -r 8000 -t wav -s w - | ./noise-filter >bits
Run this until you have sufficient bits to suit your taste.
It will not terminate before
rec, which means you need to press
Test the file length (in bytes) with the following command in another shell:
ls -l bits
When terminating, the noise filter reports how many bits 0 and 1 it has produced -- be sure to check if they are not too far off.
Keep in mind, though that the numbers are not likely to be the same, as this is statistics, looking at concrete values afterwards, and not chance theory, where you predict data before having it in your hand.
Concrete data is usually different from chances (you never met the average American, for the same reason).
Validating the random numbers
Okay, we now have random numbers.
But are they any good?
Is there sufficient entropy, do they spread evenly, don't they cling together around the extremes?
A program to answer such questions is
ent, which can be obtained from Fourmi Labs.
This program does a lot of statistical analysis on the data.
Entropy is being tested, and for 100 kB it should be simple to obtain 7.98 bits of `freshness' per byte.
There is no absolution here, just very good approximations.
Chi-square is a very raw test -- it should not be too near to 0%, nor to 100%.
These results would imply that the values either `cling together' or `spread aggressively' to the extremes.
Arithmetic mean should be around 127.5, especially for long series of bits.
Monte Carlo measures whether the values spread evenly.
If so, the value should be near to Pi (3.141592653589...) but be prepared for quite a bit off of the ideal, as this test propagates only slowly towards Pi.
It may be nice to keep the statistical results for these tests, even after the random numbers are destroyed.
These values are so indirectly derived from the random data, that there should be no real danger due to this.
Improving the random data
It is not uncommon to apply a secure hashing algorithm such as MD5 or SHA1 to blocks of the generated random numerals; this has the effect of decoupling the random data from the physical effects that cause them.
If you prefer to be totally independent of external influences, you may want to look into random number generators based on PN-junctions in semiconductors (such as transistors). A nice reference to such circuits is here.
Using the random data
You now have an utterly secret list of good random numbers.
Good, now put them to good use...
One thing you can do is chop them in 128-bit chunks and use them directly as symmetrical keys to encrypt messages.
But more likely, you will want to generate a private/public keypair with it.
To generate a RSA key, you would run
openssl genrsa -rand bits -out privkey.pem -des3
This writes out a private key and encrypts it with a password that you must enter.
Now that you have used the random data, you should remove it from your system, because you want to avoid replay.
Simply removing the file with the
rm bits command is good for those who are not overly paranoid.
If you are paranoid, you may want to use a package that overwrites the hard disk blocks that now hold the random bits.
You may do so with a command like
This one has been designed with constraints like this one in mind.
Only now, with no traces of the random bits left, is replay avoided.
So this is the moment you may go online again, as far as random number generation is concerned.
About this document
This document is ©2002-2007 OpenFortress, Rick van Rein.
This document and included software are part of the Open Source outlet of OpenFortress.
You are free to use this document and programs for any legal purpose, but always at your own risk.
Please refer others to this site and refrain from spreading copies of this document -- because copies cannot be updated.
OpenFortress provides a free online service from which you can download entropy generated according to these principles.