Week 2 Lecture Notes

This week we will see how moment generating functions (MGFs) lead to the Central Limit Theorem. For more detail, read Evans and Rosenthal, Chapter 4.

Monday, January 20

Last week we had a theorem that said that a random variable’s distribution is completely determined by its MGF. Another similar fact is the following:

Theorem (Converging MGFs)

If $X_n$ is a sequence of random variables with MGFs that converge to the MGF of a random variable $Y$ , then the probability distribution of $X_n$ converges to the probability distribution of $Y$ .

We worked out the following examples in class.

Example 1: Poisson Distribution is a Limit of Binomials

What is the MGF for a Binom(n,p) random variable?
Suppose that $X_n$ has a $\text{Binom}(n,\frac{\lambda}{n})$ distribution for some fixed $\lambda$ . What is the MGF for $X_n$ ?
Use the fact that $\lim_{n \rightarrow \infty} \left( 1 + \frac{r}{n} \right)^n = e^r$ to find the limit of the MGFs from #2.
What probability distribution corresponds to the MGF in #3?

Example 2: Sums of Independent Identically Distributed (i.i.d.) Random Variables

Suppose that $X$ is any random variable with mean $\mu = 0$ and variance $\sigma^2 = 1$ . What are the first three moments $E(X^0)$ , $E(X^1)$ , and $E(X^2)$ ?
Find the first 3 terms of the Maclaurin series for the MGF $m_X$ .
If $X_1, X_2, \ldots, X_n$ are i.i.d. RVs with mean 0 and variance 1, then what are the mean and variance of $X_1+X_2+\ldots X_n$ ?
Let $Y_n = \frac{X_1+X_2+ \ldots + X_n}{\sqrt{n}}$ . Use homework 1, problem 5, to find the MGF for $Y_n$ in terms of the shared MGF of the $X_i$ ’s.
Use Taylor’s Theorem to show that on any bounded interval $I$ containing $0$ , there is a constant $C$ such that $1+\frac{t^2}{2} - C |x|^3 \le m_X(t) \le 1+\frac{t^2}{2} + C |x|^3.$

Wednesday, January 22

Last time we showed that if $X$ is a random variable with mean 0 and variance 1, that on any bounded interval containing 0 there is a constant $C$ such that $1+\frac{t^2}{2} - C |x|^3 \le m_X(t) \le 1+\frac{t^2}{2} + C |x|^3.$

We can simplify the notation for this by using Big-O Notation.

Definition (Big-O Notation)

If $f, g,$ and $h$ are functions defined on an interval $I$ and there is a constant $C$ such that $|f(x) - g(x)| \le C |h(x)|$ for all $x \in I$ , then we say that $f(x) = g(x) + O(h(x))$ .

In many applications, Big-O notation is used on intervals of the form $(a,\infty)$ to deal with the asymptotic behavior of functions near infinity. For our purposes, the interval $I$ will typically be a neighborhood around 0, that is, an interval of the form $(-a,a)$ , so Big-O notation helps understand how a function behaves near zero.

Theorem (Taylor’s Theorem Using Big-O)

If $f(x)$ has derivatives up to order $n+1$ at $x=0$ , then $f(x) = f(0) + f'(0) x + \frac{f''(0)}{2!}x^2 + \ldots + \frac{f^{(n)}(0)}{n!}x^n + O(x^{n+1})$ in an interval around 0.

In-Class Exercises

Re-write the inequality for $m_X(t)$ above using Big-O notation.
Write the MGF for $X/\sqrt{n}$ using Big-O notation to collect all the terms of order 3 or higher.
Suppose that $X_1, \ldots, X_n$ are i.i.d. random variables with mean 0 and variance 1. Show that the MGF for $\frac{X_1+\ldots+X_n}{\sqrt{n}}$ converges to the MGF for a Norm(0,1) random variable as $n \rightarrow \infty$ .

In order to do #3, you will need to calculate limits of the form $\lim_{n \rightarrow \infty} \left(1+ \frac{t^2}{2n} + \frac{Ct^3}{n^{3/2}} \right)^n.$ One approach is to take the natural logarithm of the limit, and then use L’Hospital’s rule. Do this, and show that the limit is $e^{t^2/2}$ , no matter what $C$ is.

This argument leads to:

Theorem (Central Limit Theorem)

If $X_1, X_2, \ldots, X_n$ are i.i.d. random variables with mean $\mu$ and variance $\sigma^2$ , then the probability distribution of $\frac{X_1+X_2+\ldots+X_n}{\sqrt{n}}$ converges to the $\text{Norm}(\mu,\sigma)$ distribution as $n\rightarrow \infty$ .

We finished the class by mentioning that MGFs are not always defined. An alternative proof of the central limit theorem uses characteristic functions instead of MGFs. These are similar to MGFs, but are defined using the imaginary number $i$ as $E(e^{itX})$ . Characteristic functions are the Fourier transforms of the probability density function, and are always defined for every probability distribution, unlike MGFs.

Friday, January 24

We finished our discussion of MGFs today by looking at two examples.

Definition (Laplace Distribution)

The Laplace distribution has probability density function: $f(x) = \frac{1}{2}e^{-|x|}$

Using the definition of a MGF, the MGF for a Laplace distribution is:

$\frac{1}{2} \int_{-\infty}^{\infty} e^{tx} e^{-|x|} \, dx$

Depending on the value of $t$ , that integral might be finite, or infinite, as we saw in the graphs below:

Desmos Link

In particular, if $t \ge 1$ , then the right side of $e^{tx}e^{-|x|}$ does not decay, so the integral is infinite. The same problem happens on the left side when $t \le -1$ .

In-Class Exercises

Calculate integral to find the MGF of the Laplace distribution.
Write the MGF of the Laplace distribution as a Maclaurin series. What are the 1st through 6th moments?

The reason that the Laplace distribution is not defined when $t$ is too big is that the tails of the distribution do not decay fast enough. You can get even worse behavior if you choose a distribution with tails that converge even slower.

Definition (Cauchy Distribution)

The Cauchy distribution has probability density function: $f(x) = \frac{1}{\pi} \frac{1}{1+x^2}.$

Show that the MGF for the Cauchy distribution is not defined for any $t \ne 0$ .
Change the Desmos graph above for the Laplace MGF to a graph showing the function that you would have to integrate to get the Cauchy MGF. Does the graph make it clear (if you zoom out far enough) why the integral won’t converge to a finite area?