Week 2 Lecture Notes

This week we will see how moment generating functions (MGFs) lead to the Central Limit Theorem. For more detail, read Evans and Rosenthal, Chapter 4.

Monday, January 20

Last week we had a theorem that said that a random variable’s distribution is completely determined by its MGF. Another similar fact is the following:

Theorem (Converging MGFs)

If XnX_n is a sequence of random variables with MGFs that converge to the MGF of a random variable YY, then the probability distribution of XnX_n converges to the probability distribution of YY.

We worked out the following examples in class.

Example 1: Poisson Distribution is a Limit of Binomials

  1. What is the MGF for a Binom(n,p) random variable?

  2. Suppose that XnX_n has a Binom(n,λn)\text{Binom}(n,\frac{\lambda}{n}) distribution for some fixed λ\lambda. What is the MGF for XnX_n?

  3. Use the fact that limn(1+rn)n=er\lim_{n \rightarrow \infty} \left( 1 + \frac{r}{n} \right)^n = e^r to find the limit of the MGFs from #2.

  4. What probability distribution corresponds to the MGF in #3?

Example 2: Sums of Independent Identically Distributed (i.i.d.) Random Variables

  1. Suppose that XX is any random variable with mean μ=0\mu = 0 and variance σ2=1\sigma^2 = 1. What are the first three moments E(X0)E(X^0), E(X1)E(X^1), and E(X2)E(X^2)?

  2. Find the first 3 terms of the Maclaurin series for the MGF mXm_X.

  3. If X1,X2,,XnX_1, X_2, \ldots, X_n are i.i.d. RVs with mean 0 and variance 1, then what are the mean and variance of X1+X2+XnX_1+X_2+\ldots X_n?

  4. Let Yn=X1+X2++XnnY_n = \frac{X_1+X_2+ \ldots + X_n}{\sqrt{n}}. Use homework 1, problem 5, to find the MGF for YnY_n in terms of the shared MGF of the XiX_i’s.

  5. Use Taylor’s Theorem to show that on any bounded interval II containing 00, there is a constant CC such that 1+t22C|x|3mX(t)1+t22+C|x|3.1+\frac{t^2}{2} - C |x|^3 \le m_X(t) \le 1+\frac{t^2}{2} + C |x|^3.


Wednesday, January 22

Last time we showed that if XX is a random variable with mean 0 and variance 1, that on any bounded interval containing 0 there is a constant CC such that 1+t22C|x|3mX(t)1+t22+C|x|3.1+\frac{t^2}{2} - C |x|^3 \le m_X(t) \le 1+\frac{t^2}{2} + C |x|^3.

We can simplify the notation for this by using Big-O Notation.

Definition (Big-O Notation)

If f,g,f, g, and hh are functions defined on an interval II and there is a constant CC such that |f(x)g(x)|C|h(x)||f(x) - g(x)| \le C |h(x)| for all xIx \in I, then we say that f(x)=g(x)+O(h(x))f(x) = g(x) + O(h(x)).

In many applications, Big-O notation is used on intervals of the form (a,)(a,\infty) to deal with the asymptotic behavior of functions near infinity. For our purposes, the interval II will typically be a neighborhood around 0, that is, an interval of the form (a,a)(-a,a), so Big-O notation helps understand how a function behaves near zero.

Theorem (Taylor’s Theorem Using Big-O)

If f(x)f(x) has derivatives up to order n+1n+1 at x=0x=0, then f(x)=f(0)+f(0)x+f(0)2!x2++f(n)(0)n!xn+O(xn+1)f(x) = f(0) + f'(0) x + \frac{f''(0)}{2!}x^2 + \ldots + \frac{f^{(n)}(0)}{n!}x^n + O(x^{n+1}) in an interval around 0.

In-Class Exercises

  1. Re-write the inequality for mX(t)m_X(t) above using Big-O notation.

  2. Write the MGF for X/nX/\sqrt{n} using Big-O notation to collect all the terms of order 3 or higher.

  3. Suppose that X1,,XnX_1, \ldots, X_n are i.i.d. random variables with mean 0 and variance 1. Show that the MGF for X1++Xnn\frac{X_1+\ldots+X_n}{\sqrt{n}} converges to the MGF for a Norm(0,1) random variable as nn \rightarrow \infty.

In order to do #3, you will need to calculate limits of the form limn(1+t22n+Ct3n3/2)n.\lim_{n \rightarrow \infty} \left(1+ \frac{t^2}{2n} + \frac{Ct^3}{n^{3/2}} \right)^n. One approach is to take the natural logarithm of the limit, and then use L’Hospital’s rule. Do this, and show that the limit is et2/2e^{t^2/2}, no matter what CC is.

This argument leads to:

Theorem (Central Limit Theorem)

If X1,X2,,XnX_1, X_2, \ldots, X_n are i.i.d. random variables with mean μ\mu and variance σ2\sigma^2, then the probability distribution of X1+X2++Xnn\frac{X_1+X_2+\ldots+X_n}{\sqrt{n}} converges to the Norm(μ,σ)\text{Norm}(\mu,\sigma) distribution as nn\rightarrow \infty.

We finished the class by mentioning that MGFs are not always defined. An alternative proof of the central limit theorem uses characteristic functions instead of MGFs. These are similar to MGFs, but are defined using the imaginary number ii as E(eitX)E(e^{itX}). Characteristic functions are the Fourier transforms of the probability density function, and are always defined for every probability distribution, unlike MGFs.


Friday, January 24

We finished our discussion of MGFs today by looking at two examples.

Definition (Laplace Distribution)

The Laplace distribution has probability density function: f(x)=12e|x|f(x) = \frac{1}{2}e^{-|x|}

Using the definition of a MGF, the MGF for a Laplace distribution is:

12etxe|x|dx\frac{1}{2} \int_{-\infty}^{\infty} e^{tx} e^{-|x|} \, dx

Depending on the value of tt, that integral might be finite, or infinite, as we saw in the graphs below:

Desmos Link

In particular, if t1t \ge 1, then the right side of etxe|x|e^{tx}e^{-|x|} does not decay, so the integral is infinite. The same problem happens on the left side when t1t \le -1.

In-Class Exercises

  1. Calculate integral to find the MGF of the Laplace distribution.

  2. Write the MGF of the Laplace distribution as a Maclaurin series. What are the 1st through 6th moments?

The reason that the Laplace distribution is not defined when tt is too big is that the tails of the distribution do not decay fast enough. You can get even worse behavior if you choose a distribution with tails that converge even slower.

Definition (Cauchy Distribution)

The Cauchy distribution has probability density function: f(x)=1π11+x2.f(x) = \frac{1}{\pi} \frac{1}{1+x^2}.

  1. Show that the MGF for the Cauchy distribution is not defined for any t0t \ne 0.

  2. Change the Desmos graph above for the Laplace MGF to a graph showing the function that you would have to integrate to get the Cauchy MGF. Does the graph make it clear (if you zoom out far enough) why the integral won’t converge to a finite area?