A Lesson in Statistics Using Random Sequences
Statistics may come naturally to some people, but it is a difficult subject for many of us. Using simulations helps to remove some of the mystery of statistics. Luckily, it is trivial to produce a pseudo-random Gaussian or uniform sequence: a single command in Matlab (or Python) does it.
In this article, I try to explain a few concepts using Gaussian and uniform random sequences generated in Matlab. First, we’ll generate a Gaussian sequence, calculate its variance, and plot a histogram and probability density function (PDF). Next, we’ll simulate the roll of a single die to illustrate a uniform distribution. Then we’ll simulate rolling two dice and finally, as an illustration of the Central Limit Theorem, we’ll sum the total when rolling several dice to obtain an approximately Gaussian distribution.
Example 1. Generating a Gaussian Sequence in Matlab
There are many methods for generating pseudorandom Gaussian (or “normal”) sequences. A survey of methods is provided in [1]. Gaussian sequences are characterized by a mean μ and standard deviation σ. Matlab has a function randn that generates a standard Gaussian sequence; i.e., a sequence with μ = 0 and σ = 1. The “n” in randn stands for “normal” distribution. Here is Matlab code to generate a Gaussian sequence with μ = 0 and σ = 1:
rng(0) % set random number generator seed
N = 10000; % number of samples in sequence
sigma = 1.0; % standard deviation
x = sigma*randn(1,N); % Gaussian sequence
The random number generator seed may be any positive integer, including zero. The sequence x is plotted in Figure 1, where we see that most samples are within +/- 3σ. In fact, about 99.7% of the samples fall within these bounds.
Given any sequence, we can compute its mean and variance. The mean μ is given by:
$$ \mu = \frac{1}{N} \sum_{n=0}^{N-1} x(n) \qquad (1) $$
The variance is given by [2]:
$$ variance = \sigma^2 = \frac{1}{N} \sum_{n=0}^{N-1} [x(n)-\mu]^2 \; , \quad N > 100 \qquad (2) $$
where we define the standard deviation σ as the square-root of the variance. (For N less than 100 or so, the denominator is typically replaced by N-1, which yields the unbiased estimate of the variance [2]). For our sequence, we used σ = 1.0, so variance should be approximately σ2 = 12 = 1. Noting that μ = 0, here is the calculation:
variance = sum(x.^2)/N % variance of sequence
variance = 0.9829
The calculated variance of the sequence is not exactly 1.0; that is, it is not exactly equal to the variance of the underlying Gaussian process. This is a feature of real-world sequences called statistical noise [3]. For a longer sequence, the variance will be closer to 1. Note that the calculated variance will change slightly, depending on the seed used. Should we worry about the pseudorandom sequence repeating? Not really. The period of Matlab’s “Mersenne Twister” random number generator is claimed to be 219937 – 1 [4].
We can give a physical interpretation to variance if we let x have units of volts. Letting μ = 0, we have:
$$ \sigma^2 = \frac{1}{N} \sum_{n=0}^{N-1} x(n)^2 \qquad (3) $$
Assuming a resistance of one ohm, x(n)2 is then the instantaneous power of sample n and σ2 is the average power of the zero-mean sequence, in watts. This interpretation can be used to create discrete time signals with a given signal to noise ratio (SNR) for simulations [5, 6].
Figure 1. Gaussian Sequence x of length 10,000.
Histogram of a Sequence
Figure 2 (left) shows the sequence x again. Figure 2 (right) shows the same data plotted as a histogram, which has the same vertical axis as the left plot. Horizontal bars show the number of samples that fall within amplitude ranges or “bins.” These numbers are called bin counts or simply counts. We see that the largest bin count occurs at x = 0, and the count is small for |x| > 3.
Figure 2. Gaussian random sequence x (left) and its histogram (right).
Example 2. Histogram of a Gaussian Sequence
Let’s create a histogram for a Gaussian sequence with length N = 100E3. This histogram is plotted in Figure 3, where we have assigned x to the horizontal axis and counts to the vertical axis. The sequence x has continuous values, which we divide into bins of width 0.25. For example, there is a bin centered at 0, with bin count of 9949. Using narrower bins increases horizontal resolution, but causes more unwanted variation of the bin counts.
Figure 3 was created using the following Matlab code. The short Matlab function pdf_plot graphs histograms and PDFs. The function is listed in Appendix A.
rng(0) % set random number gererator seed
N = 100E3; % number of samples in sequence
sigma = 1.0; % standard deviation
x = sigma*randn(1,N); % Gaussian sequence
pdf_plot(x) % plot histogram and pdf
Figure 3. Histogram of Gaussian sequence, σ = 1, N = 100E3.
Probability Density Function (PDF) of a Sequence
In Figure 3, the sum of all the bin values is equal to the number of samples N:
$$ \sum_{i=1}^{nbins} counts(i) = N \qquad (4) $$
where i is the bin number and nbins is the total number of bins. Equation 4 requires that the defined bins include all samples of x. For example, if the bins extended from x = -2 to 2, some samples would be excluded and the sum would be less than N. We can also write:
$$ \frac{1}{N}\sum_{i=1}^{nbins} counts(i) = 1 \qquad (5) $$
Now define the estimated probability density function (PDF) of our sequence as [7]:
$$ pdf(i) = \frac{counts(i)}{N*w} \qquad (6) $$
where w is the width of the histogram bin. Then
$$ \sum_{i=1}^{nbins} pdf(i)*w = 1 \qquad (7) $$
pdf(i)*w is the estimated probability of bin i, and the sum of these probabilities is 1, as shown.
Example 3. PDF of Gaussian Sequences
The Matlab Code of Example 2 also plots the PDF of the Gaussian sequence, as shown in Figure 4. In this plot, the area of each bar is w*pdf(i), and the areas of all the bars sums to 1, as shown in Equation 7.
How well does our sequence’s PDF agree with the underlying Gaussian process? The Gaussian PDF is given by:
$$ p(x) = \frac{1}{\sigma \sqrt{2\pi}}e^{{-(x-\mu})^2/2\sigma^2} \qquad (8) $$
Equation 8 with σ = 1 and μ = 0 is plotted as the red curve in Figure 5. We see that, for N = 100E3, the PDF plot appears to agree well with equation 8, although the accuracy of the distribution’s tails is not visible. (To see the tails clearly, we could plot using a semi-log y axis).
Figure 4. PDF of Gaussian sequence, σ = 1, N = 100E3.

Figure 5. PDF of Gaussian sequence, σ = 1, N = 100E3. Red line plots Equation 8.
So far, all our sequences have had σ = 1. If we instead use σ = 0.5, the amplitude of the x(n) is decreased by a factor of 2. With N = 100E3, we get the PDF shown in Figure 6. The smaller amplitude shows up as a narrower pdf, given that the x axis represents amplitude.
Finally, suppose our Gaussian sequence has σ = 1, but N = 1000, much shorter than the N = 100E3 used for Figure 4. The resulting PDF is shown in Figure 7. We see increased statistical noise in the PDF, which is quite significant in the tails of the distribution. Note the empty bins centered at x = -2.75 and x = 3.25.
Figure 6. PDF of Gaussian sequence, σ = 0.5, N = 100E3.
Figure 7. PDF of Gaussian sequence, σ = 1, N = 1000.
Example 4. A Uniform Probability Distribution: Die Roll
The sequence obtained by rolling a die multiple times has a uniform probability distribution, which we’ll illustrate in this example. The examples so far have used continuous values of x. A die roll has discrete values of x, i.e. x(i) = 1, 2, … 6. The probability of a given value in a sequence of N rolls is approximately the number of occurrences (counts) of that value divided by N. We define the estimated probability mass function (PMF) of a discrete-value sequence as:
$$ pmf(i) = \frac{counts(i)}{N} \qquad (9) $$
Given that the sum of the counts must equal N, we have:
$$ \sum_{i=1}^{nbins} pmf(i)=1 \qquad(10) $$
The following Matlab code simulates the roll of a single die and plots the PMF. The Matlab function randi generates a sequence of uniform random integers over [1 6]. We find histogram counts for the sequence using the Matlab function histcounts and compute the PMF using Equation 9.
rng(0) % set random number gererator seed
N = 10000; % number of die rolls
die = randi(6,1,N); % vector of integers over [1 6]
edges= .5:1:6.5; % edges of histogram bins
counts = histcounts(die,edges); % vector of histogram counts
pmf = counts/N; % pmf vector from Equation 9
% plot PMF
x= 1:6; % horizontal axis of pmf
bar(x,pmf,.05)
xlabel('die value'),ylabel('probability')
The variable die contains the values of the die rolls.The first 16 values of die are:
5 6 1 6 4 1 2 4 6 6 1 6 6 3 5 1
The PMF of die is plotted in Figure 8, where each outcome has probability of about 1/6 = 0.1667.
Figure 8. Uniform PMF for rolling a single die, N = 10,000.
Example 5. Addition of Sequences – Rolling Two Dice
The result of rolling two dice is the sum of the values of the dice. If we roll two dice multiple times, the resulting random sequence is the sum of the two random sequences of each die value. Here is Matlab code to compute the result of rolling two dice (See Appendix B for the complete code). Note that the sequences die1 and die2 are independent.
rng(0) % set random number gererator seed
N = 10000; % number of dice rolls
die1 = randi(6,1,N); % vector of die 1 integer values
die2 = randi(6,1,N); % vector of die 2 integer values
y = die1 + die2; % sum of the dice values
The first 16 values of y range from 2 to 12:
6 12 7 9 7 2 8 5 8 8 2 7 11 6 8 3
The PMF of y is plotted in Figure 9. The probabilities fall between 1/36 (roll a 2 or a 12) and 6/36 (roll a 7). The mean and variance of y, from Equations 1 and 2 are:
mu =7.0018 variance =5.7422
The standard deviation is the square-root of variance: σ = 2.3963.
Figure 9. PMF of rolling two dice, N = 10,000.
Example 6. Illustrating The Central Limit Theorem
What happens if we roll five dice and sum the values of the dice? The Matlab code for this case is listed in Appendix B. The sum varies from 5 to 30. The PMF is plotted in Figure 10, and it looks suspiciously Gaussian. Calculating the mean and variance of this PMF yields:
mu = 17.5020
variance = 14.4949
If we calculate a Gaussian PDF (Equation 8) using these values, we get the solid curve shown in Figure 10. This Gaussian PDF evaluated at integer values of x yields a pretty good fit to the simulation, except for the tail values of 5, 6, 29, and 30. Also, the dice-roll PMF’s tail is truncated at 5 and 30, while the Gaussian PDF has infinite tails. The dice-roll PMF gets closer to Gaussian when there are more dice.
So, we have the remarkable result that adding several independent uniform random sequences results in an approximately Gaussian sequence. This is an example of the Central Limit Theorem, which, in simple terms, states that the sum of independent random sequences having arbitrary probability distributions is approximately Gaussian. Equivalently, if you sample a single sequence randomly to create M new sequences, then the sum of those sequences is approximately Gaussian [8]. One can start to see why the Gaussian distribution is ubiquitous.
Figure 10. PMF of rolling five dice, N = 100E3. Red line is a Gaussian PDF.
See the PDF version for Appendices.
References
1.Thomas, David, et. al., “Gaussian Random Number Generators,” ACM Computing Surveys, Vol. 39, No. 4, October, 2007, https://dl.acm.org/doi/10.1145/1287620.1287622
2.Lyons, Richard G.Understanding Digital Signal Processing, 3rd Ed., Pearson, 2011, Appendix D.7
3. Smith, Steven, The Scientist and Engineer’s Guide to Digital Signal Processing, California Technical Publishing, 1997, p 17-18, https://www.dspguide.com/
4.The Mathworks website, “Creating and Controlling a Random Number Stream,”
https://www.mathworks.com/help/matlab/math/creating-and-controlling-a-random-number-stream.html
5.Robertson, Neil, “Setting Carrier to Noise Ratio in Simulations,” DSPRelated.com, April, 2021,
https://www.dsprelated.com/showarticle/1398.php
6.Lyons, Appendix D.5
7. The Mathworks website, “histcounts”, normalization table, https://www.mathworks.com/help/matlab/ref/double.histcounts.html#buo1m5a-5
8.Navidi, William, Statistics for Engineers and Scientists, 3rd Ed., McGraw Hill, 2011, section 4.11
March 2026 Neil Robertson
You might also like
- Comments
- Write a Comment Select to add a comment

"but it is a difficult subject for many of us."
Preach, my friend! I still remember me and my engineering friends calling it "sadistics" in undergrad.
Excellent review!

And Thermodynamics was Thermodamnamics, as I recall.

Fun article! If you ever need to generate Gaussian numbers without a library function, then the Box-Muller transform is a good way to go. It takes two uniformly distributed random numbers in the interval (0,1) and outputs two Gaussian distributed numbers.
https://en.wikipedia.org/wiki/Box%E2%80%93Muller_t...
I previously had fun implementing it in Awk:
https://remcycles.net/blog/awk_noise.html
Ref. 1 is a good reference for the Box-Muller transform too. I hadn't seen it before your post.
Since you posted this on Pi Day, I have to link to the Tau Manifesto, which has a couple paragraphs about why $2\pi$ appears in the Gaussian PDF (eq. 8):
https://www.tauday.com/tau-manifesto#sec-the_pi_ma...
...the factor of $2\pi$ comes from squaring the unnormalized Gaussian distribution and switching to polar coordinates, which leads to a factor $1$ of from the radial integral and a $2\pi$ from the angular integral.
This post shows the derivation in more detail, with the two integrals:
https://math.stackexchange.com/a/385427/1097005
The Box-Muller transform almost does that in reverse. From Ref. 1, section 2.2.1:
The magnitude of the corresponding vector is obtained by transforming a uniform random number; a random phase is then generated by scaling a second uniform random number by $2\pi$. Projections onto the coordinate axes are then performed to give the Gaussian numbers.
Connecting these dots I now have some more intuition about how the Box-Muller transform works and why it works on pairs! Thanks!

Hi Remington,
I'm a little embarrassed that I did not realize I published this post on pi day, but, since pi day is "wrong", I guess I shouldn't be. Thanks for the links, particularly the one for the Tau Manifesto.
-- Neil

Nice Remington! I hope you are doing well.
Related, we can generate samples from a Gaussian random process by adding up the sum of N independent random numbers if they come from an identically distributed random process with finite mean and variance (Central limit theorem). If you care about far out tails, I think Box-Muller is the way to go since you would need a large N to get tail accuracy. For example, with N = 15 the simple sum of a uniform has about 10% error at 3 sigma, but is much cheaper to compute (if we're concerned about the log, square root etc in Box Muller).

Nice summary Neil! This is good stuff when you are doing signal processing where signal to noise ratios are critical design parameters (which is basically all the signal processing I do).

Thanks Dan.
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers:






