Probability Notes

Math 421 - Fall 2023

Jump to week:

Week 1 Notes

Day Sections Topic
Mon, Aug 21 1.1 - 1.4 Counting
Wed, Aug 2 1.5 - 1.6 Story proofs & definition of probability
Fri, Aug 25 1.5 Simulation with Python

Monday, August 21

Today we review the basic rules for counting.

We looked at how these rules can help us solve probability problems using the naive definition of probability.

  1. What is the probability that a hand of 5 cards from a shuffled deck of 52 playing cards is a full house (3 of one rank and 2 of another)?

  2. What is the probability that a group of 30 random people will all have different birthdays?

  3. How many different ways are there to rearrange the letters of MISSISSIPPI?

  4. What if you randomly permuted the letters of MISSISSIPPI? What is the probability that they would still spell MISSISSIPPI?

The functions nCk_n C_k and nPk_n P_k are both in the Python standard math library. You can use the following code to calculate them:

import math

n, k = 5, 3
print(math.comb(n,k))
print(math.perm(n,k))

Wednesday, August 23

Today we introduced story proofs with these examples:

  1. Explain why it makes sense that (nk)=(nnk)\binom{n}{k} = \binom{n}{n-k} in terms of choosing subsets of a set with nn elements.

  2. Prove that n(n1k1)=k(nk)\displaystyle n\binom{n-1}{k-1} = k \binom{n}{k}. Hint: there are the same number of ways to choose a team captain who then picks the rest of the team as there are ways to pick the team first, and then select the captain from within the team.

  3. Prove Vandermonde’s identity (m+nk)=j=0k(mj)(nkj).\binom{m+n}{k} = \sum_{j = 0}^k \binom{m}{j} \binom{n}{k-j}.

After that, we introduced the general definition of probability:

Definition. A probability space consists of a set SS of possible outcomes called the sample space, and a probability function P:2S[0,1]P:2^S \rightarrow [0,1] which satisfies two axioms:

We looked at the following examples of probability spaces:

  1. Describe the sample space and probability function for rolling a fair six-sided die.

  2. Suppose you flip a fair coin repeatedly until it lands on heads. The sample space is the number of flips it takes. Describe the sample space, the probabilities of the individual outcomes in the sample space, and then calculate the probability that you get an odd number of flips. (We needed geometric series to answer this question!).

  3. What is the probability it takes an even number of flips?

We finished by proving the Complementary events formula P(AC)=1P(A)P(A^C) = 1-P(A).

Friday, August 25

Today we did probability simulations in Python using Google Colab.

  1. Estimate the probability that the total of three six-sided dice is 12 by simulating rolling 3 dice many times.

  2. Chevalier de Mere’s problem. The Chevalier de Mere was a French gambler in the 1600’s. He knew that the chance of rolling a one on a six-sided die is 1/6 and the chance of rolling two dice and getting both die to land on ones is 1/36. So he reasoned that the probability of getting a one in four rolls of a single die should be the same as getting a pair of ones in 24 rolls of two dice. Write a simulation in Python to see which is more likely to happen.

Here are my sample solutions in Google Colab.

Week 2 Notes

Day Sections Topic
Mon, Aug 28 2.1 - 2.3 Bayes rule
Wed, Aug 30 2.4 - 2.5 Independence
Fri, Sep 1 2.6 - 2.7 Conditioning

Monday, August 28

Today we talked about conditional probability: P(A|B)=P(AB)P(B).P(A | B) = \frac{P( A \cap B)}{P(B)}.

We did these examples:

  1. Shuffle a deck of cards and draw two cards off the top. Find P(2nd is an Ace|1st is an Ace)P(\text{2nd is an Ace} ~|~ \text{1st is an Ace} ).

  2. Women in their 40’s have an 0.8% chance to have breast cancer. Mammograms are 90% accurate at detecting breast cancer for people who have it. They are also 93% accurate at not detecting breast cancer when someone doesn’t have it.

  1. Find P(Test Positive|Have cancer)P(\text{Test Positive} ~|~\text{Have cancer}).
  2. Find P(Test Positive|Don’t have cancer)P(\text{Test Positive} ~|~\text{Don't have cancer}).
  3. Find P(Test positive)P(\text{Test positive}).
  4. Find P(Have breast cancer|Tested positive)P(\text{Have breast cancer} ~|~\text{Tested positive}).

We also talked about how to draw weighted tree diagrams to keep track of the probabilities in an example like this. And we also reviewed some of the Python we used on Monday. See this Google colab example.

Wednesday, August 30

Today we talked about Bayes formula, both the standard version and the version for calculating posterior odds based on prior odds and the likelyhood ratio. We did these examples:

  1. 5% of men are color blind, but only 0.25% of women are. Find the posterior odds that someone is male given that they are color blind.

  2. What is the likelihood ratio for having breast cancer if someone has a positive mammogram test?

  3. (Problem from OpenIntroStats.) Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.

We also looked at other examples of conditional probabilities in two-way tables:

We finished by defining independent events and proving that if A,BA, B are independent, then so are AA and BcB^c.

Friday, September 1

Today we talked about how to use conditioning to solve probability problems. We did two types of examples: ones where you condition on all of the possible outcomes of a family of events that partition the sample space (using the Law of Total Probability) and the other where you condition on the outcome of a single event (using the conditional probability definition or Bayes formula). We did these examples:

  1. In the example from Wednesday where Jose is trying to find parking on campus, find P(garage is full)P(\text{garage is full}).

  2. If I have a bag with five dice, one four-sided, one six-sided, one eight-sided, one twelve-sided, and one twenty-sided, and I randomly select one die from the bag and roll it, what is P(result=5)P(\text{result} = 5)?

  3. What if you didn’t know which die I rolled, but you knew the result was 5? Find P(die is twenty-sided|result=5)P(\text{die is twenty-sided}|\text{result}=5).

  4. Prove that for any two events AA and BB, P(A|AB)P(A|B)P(A | A \cup B) \ge P(A|B). Hint: Condition on whether or not BB occurs.

  5. 5% of men are color blind and 0.25% of women are. 25% of men are taller than 6 feet tall, while only 2% of women are. Find the conditional probability that a random person is male if you know they are both color blind and taller than 6 ft.

    1. Notice that you have to assume that color blindness is independent of height for both men & women.
    2. Notice also that color blindness and height definitely are not independent for all adults. Why not?
    3. You can condition on both color blindness and heigh simultaneously or you can condition on just one, and then condition on the other. Both approaches will work and give the same answer:

P(A|BC)=P(AB|C)P(B|C)=P(ABC)P(BC).P(A | B \cap C) = \frac{P(A \cap B | C)}{P(B | C)} = \frac{P(A \cap B \cap C)}{P(B \cap C)}.

Week 3 Notes

Day Sections Topic
Wed, Sep 6 2.8 - 2.9 Conditioning - con’d
Fri, Sep 8 3.1 - 3.2 Discrete random variables

Wednesday, Sep 6

Today we talked about counter-intuitive probability examples.

  1. The Monte Hall problem.

  2. The prosecutor’s fallacy. In 1998, Sally Clark was tried for murder after two of her sons died suddenly after birth. The prosecutor argued that the probability of one child dying of Sudden Infant Death syndrome (SIDs) is 1/8500, so P(two children died| not guilty)=1850018500=172,250,000P(\text{two children died} | \text{ not guilty} ) = \frac{1}{8500} \cdot \frac{1}{8500} = \frac{1}{72,250,000} Given how small that number was, the prosecutor argued that this proved that Sally Clark was guilty beyond a reasonable doubt. What assumptions was the prosecutor making?

  3. p-values in Statistics. We looked at an example where I used my psychic powers to guess 10 out of 25 Zener cards correctly. The conditional probability that I would get so many correct if I were just guessing is very low. The actual answer uses the binomial distribution which we will talk about next week. But is that strong evidence that I am psychic?

Friday, Sep 8

Today we introduced random variables. We defined discrete random variables and their probability mass functions (PMFs). We looked at these examples:

  1. Flip two coins. Let X = the number of heads. What is the PMF for X?

  2. Roll two six-sided dice and let T = the total. What is the PMF for T?

  3. Flip a coin until it lands on heads. Let Z = the number of flips. What is the PMF for Z?

We also simulated the PMF of the random variable X = # of correct guesses on 25 tries with Zener cards.

Week 4 Notes

Day Sections Topic
Mon, Sep 11 3.3 - 3.4 Bernoulli, Binomial, Hypergeometric
Wed, Sep 13 3.3 - 3.6 Hypergeometric, CDFs
Fri, Sep 15 3.7 - 3.8 Functions of random variables

Monday, September 11

Today we introduced the binomial distribution and the hypergeometric distribution. We derived the probability mass functions for both distributions and spent some time proving the PMF formula for the binomial distribution. We did the following in class exercises:

  1. Graph the PMFs for Bin(2,0.5) and Bin(3,0.5).

  2. One step in the proof of the binomial PMF formula was to give a story proof that (nk)=(n1k)+(n1k1).\binom{n}{k} = \binom{n-1}{k} + \binom{n-1}{k-1}.

  3. Suppose a large town is 47% Republican, 35% Democrat, and 18% independent. A political poll asks a sample of 100 residents their political affiliation. Let X be the number of Republicans. What is the probability distribution for X?

  4. What is the probability of getting exactly 3 aces in a five card poker hand?

  5. What is the distribution of the number of aces in a five card poker hand?

Wednesday, September 13

Today we introduced the cumulative distribution function (CDF) of a random variable. We used the binomial distribution CDF to calculate the following:

  1. In roulette, if you bet on a number like 7, you have a 1/38 probability of winning. If you bet $1 and you win, then you get $36 dollars. If you play 100 games of roulette and bet $1 on 7 every time, what is the probability that you lose money?

  2. What is the probability that someone who is just guess would get 10 or more Zener cards correct out of 25?

There is also a CDF function for the hypergeometric distribution, and we used it to find the following:

  1. There is an urn with 10 balls, 7 red and 3 black. You take a random sample of 5 balls out of the urn. Let XX be the number of white balls in your sample. Find P(X≤4).

We also talked about how to get access to CDF functions in Python and R. Here is example of how to use Python with the scipy.stats module to work with binomial and hypergeometric random variables:

from scipy.stats import binom, hypergeom

X = binom(25,1/5)
print(1-X.cdf(9))

Y = hypergeom(10,7,5) 
print(Y.cdf(4))

Then we talked about functions of random variables.

  1. Suppose a particle starts at position 0 on a number line, and then randomly moves left or right by 1 unit (with equal probabilities) every second. Express the location of the particle on the number line as a function of a binomial random variable.

You can also define functions of more than one random variable. For example:

  1. If you roll two six-sided dice and let X be the result of the first roll and Y be the result of the second. Then f(X,Y) = X+Y is a function of two random variables. It is also a new R.V. in its own right. How would you find the PMF for this new R.V.?

  2. For the random variables X and Y above, what is the difference between 2X and X+Y?

Friday, September 15

Today we finished chapter 3 in the textbook. We defined what it means for two or more random variables to be independent. We observed that if XBin(n,p)X \sim \operatorname{Bin}(n,p), then X=X1+X2+XnX = X_1 + X_2 + \ldots X_n where XkBin(1,p)X_k \sim \operatorname{Bin(1,p)} are i.i.d (independent, identically distributed) Bernoulli random variables.

  1. Use this idea to show that if XBin(n,p)X \sim \operatorname{Bin}(n,p) and YBin(m,p)Y \sim \operatorname{Bin}(m,p) are independent, then X+YBin(m+n,p)X+Y \sim \operatorname{Bin}(m+n,p).

  2. A casino has 5 roulette wheels. One of the wheels is improperly balanced, so it lands on a 7 twice as often as it should (p=1/19p = 1/19). Bob likes to play roulette and he always bets on 7. Suppose that Bob plays 100 games of roulette at the casino. If he wins 5 games, find the probability that Bob was at the unbalanced wheel. What if he wins 10 times?

Week 5 Notes

Day Sections Topic
Mon, Sep 18 Review
Wed, Sep 20 Midterm 1
Fri, Sep 22 4.1 - 4.2 Expectation

Monday, September 18

Today we looked at the following questions:

  1. Explain why the identity below makes sense (i.e., give a story proof). (n2)=(k2)+k(nk)+(nk2), for all 1kn1.\binom{n}{2} = \binom{k}{2} + k(n-k) + \binom{n-k}{2}, \text{ for all } 1 \le k \le n-1.

  2. Alice and Bob are both asked to pick their favorite three movies from the same list of 10 choices. Assume their choices are completely independent and are essentially random with each movie equally likely to be picked. Let MM be the number of movies that they both pick.

  1. Is MM a sample space, an event, a probability, a random variable, or a probability distribution?

  2. What is P(M=3)P(M = 3)?

  1. Bob needs to get surgery on his knee. The doctor warns him that there is an 10% chance that the surgery will not fix the problem. There is also a 4% chance he could get an infection, and there is a 3% chance that he will both get an infection and the surgery will fail to fix the problem.
  1. What is the probability that the surgery succeeds without infection?

  2. Are the events that Bob gets an infection and the surgery fails independent?

  3. If Bob gets an infection, what is the conditional probability that the surgery will fail to fix his knee?

  1. A large company plans to start using a drug test to screen potential employees. Suppose the test is 99% accurate at detecting drug users, but has a 2% false positive rate for non-users. If 5% of job applicants use drugs, then
  1. What percent of applicants will test positive?

  2. Find P( uses drugs | tests positive).

Friday, September 22

Today we defined expected value for discrete random variables. We calculated the expected value for the following examples:

  1. A six-sided die.

  2. A single game of Roulette betting $1 on 7.

Then we used the linearity property of expectation to find this expected value for binomial and hypergeometric random variables. Then we introduced the geometric distribution. We derived the PMF for the geometric distribution and wrote down the definition of the expected value for a geometric random variable.

Week 6 Notes

Day Sections Topic
Mon, Sep 25 4.3 - 4.4 Geometric and negative binomial
Wed, Sep 27 4.5 - 4.6 Variance
Fri, Sep 29 4.7 - 4.8 Poisson distribution

Monday, September 25

Today I handed out the Discrete Probability Distributions cheat sheet in class. You don’t need to memorize these formulas, but make sure you are familiar with them and can recognize them when they come up in problems. We also did the following:

  1. We derived the formula for the expected value of a geometric random variable.

  2. McDonalds happy meals come with a toy. One month they give out Teenage Mutant Ninja Turtle toys. Each of the four turtles is equally likely. How many happy meals would you need to buy on average to get all four toys? Key idea: treat this as a sum of random variables T1T_1, T2T_2, T3T_3, and T4T_4 which represent the meals needed to get the next turtle.

  3. We found the expected value in The St. Petersburg Paradox.

  4. We introduced the negative binomial distribution and derived the formula for its expected value.

  5. We also gave a brief introduction to variance and standard deviation and we found the variance of rolling a six-sided die.

Wednesday, September 27

Today we talked more about variance. We talked about how to use the Law of the Unconscious Statistician (LOTUS)to find variance. We also proved this alternate formula for variance: Var(X)=E(X2)E(X)2\operatorname{Var}(X) = E(X^2) - E(X)^2 We also looked at the properties of variance. If X,YX, Y are independent random variables and cc is any constant, then

  1. Var(c)=0\operatorname{Var}(c) = 0

  2. Var(X+Y)=Var(X)+Var(Y)\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)

  3. Var(cX)=c2Var(X)\operatorname{Var}(cX) = c^2 \operatorname{Var}(X)

We used these ideas to do the following exercises.

  1. Flip a coin that lands on heads with probability pp and let XX be 1 if you get heads and 0 otherwise. Find Var(X)\operatorname{Var}(X).

  2. Find the variance and standard deviation of YBinom(100,0.5)Y \sim \operatorname{Binom}(100,0.5).

  3. Let XX and YY be the results from rolling two different fair six-sided dice. Find Var(X+Y)\operatorname{Var}(X+Y) and compare it with Var(X+X)\operatorname{Var}(X+X). Which random variable has a larger variance X+YX+Y or X+XX+X? Why does the answer make sense?

  4. The variance for a Geom(p)\operatorname{Geom}(p) random variable is 1pp2\frac{1-p}{p^2}. Use that formula to find the variance of a NegBinom(n,p)\operatorname{NegBinom}(n,p) random variable.

Friday, September 29

Today we talked about the Poisson distribution. We started with the fact that Virginia gets about 0.3 earthquakes per year (at least earthquakes of magnitude greater than 4). This is a good example of a Poisson process which is any situation where the events you are looking for are rare, but they occur independently of each other and at a predictable rate. We modeled this example with a Binomial distribution where every day there is a very small chance of an earthquake.

  1. How small would p have to be for λ=np\lambda = np to be 0.3 for n=365n = 365 days?

  2. What if you used n=(365)(24)=8760n =(365)(24) = 8760 hours instead?

  3. Use the Poisson distribution PMF to compute the probability that VA gets an earthquake next year. How does it compare to the two Binomial approximations above?

  4. What is the probability that VA gets an earthquake in the next 4 months?

  5. Write down an infinite series for the expected value of a Pois(λ) random variable.

Theorem. If XPois(λ1)X \sim \operatorname{Pois}(\lambda_1) and YPois(λ2)Y \sim \operatorname{Pois}(\lambda_2) are independent, then X+YPois(λ1+λ2).X+Y \sim \operatorname{Pois}(\lambda_1 + \lambda_2).

  1. A store typically gets 10 female customers and 3 male customers per hour. Find the probability distribution for the total number of customers per hour. If the store is open from 10 AM to 6 PM, what is the probability that they get at least 100 customers in one day?

  2. If the store gets 15 customers in one hour, what is the probability that 4 of them are men?

Week 7 Notes

Day Sections Topic
Mon, Oct 2 5.1 - 5.3 Continuous random variables
Wed, Oct 4 5.4 - 5.5 Normal and exponential distributions
Fri, Oct 6 5.6 Poisson processess

Monday, October 2

Today we introduced continuous random variables. I gave the following slightly different definition than the one in the book (they are equivalent, however):

Definition. A random variable XX is continuous if it has a piecewise continuous probability density function (PDF) f(x)f(x) such that P(aXb)=abf(x)dxP(a \le X \le b) = \int_a^b f(x)\, dx for all aba \le b in [,][-\infty, \infty]. The cumulative distribution function (CDF) for XX is P(Xk)=kf(x)dx.P(X \le k) = \int_{-\infty}^k f(x) \, dx. Note that the PDF is always the derivative of the CDF.

We looked at these examples:

  1. The uniform distribution Unif(a,b)\operatorname{Unif}(a,b).

  2. The Cauchy distribution with PDF f(x)=1π(x2+1)\displaystyle f(x) = \frac{1}{\pi(x^2 + 1)}.

  3. The distribution with PDF exe^{-x} on [0,)[0, \infty).

We also defined the expected value of a continuous random variable to be E(X)=xf(x)dx.E(X) = \int_{-\infty}^\infty x f(x) \, dx.

  1. Find the expected value of the random variable with support exe^{-x} on [0,)[0,\infty).

  2. What is E(U)E(U) when UUnif(a,b)U \sim \operatorname{Unif}(a,b)?

Wednesday, October 4

Today we introduced the normal and the exponential distributions. We did the following problems.

  1. Heights of men are approximately Norm(70 inches,3 inches)\operatorname{Norm}(70 \text{ inches}, 3 \text{ inches}). Heights of women are approximately Norm(64.5 inches,2.5 inches)\operatorname{Norm}(64.5 \text{ inches}, 2.5 \text{ inches}). If you randomly pick an adult, find the conditional probability that they are a woman if they are over 70 inches tall.

Theorem. If XNorm(μX,σX)X \sim \operatorname{Norm}(\mu_X, \sigma_X) and YNorm(μY,σY)Y \sim \operatorname{Norm}(\mu_Y, \sigma_Y) are independent random variables, then X+YNorm(μX+μY,σX2+σY2)X+Y \sim \operatorname{Norm}(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2}).

  1. Let MM be the height of a random man and WW be the height of a random woman. Use the theorem above to find P(M>W)P(M > W). Hint: First find the probability distribution for MWM - W.

  2. If XExp(λ)X \sim \operatorname{Exp}(\lambda), find the expected value and variance for XX. Explain why the expected value formula makes intuitive sense.

We finished by briefly discussing the fact that exponential random variables are memoryless, that is P(X>s+t|X>s)=P(X>t).P(X > s + t | X > s) = P(X > t).

  1. Prove that XExp(λ)X \sim \operatorname{Exp}(\lambda) is memoryless.

Friday, October 6

Today we started with the Blissville vs. Blotchville example from Section 5.5 in the book.

Then we introduced the Gamma distribution. This is the last of the continuous probability distributions on the Continuous Distributions cheat sheet. We use the parameters nn and λ\lambda, but a lot of software implementations of Gamma distributions use parameters α\alpha (which is the same as nn, except it can be a decimal) and β\beta for the scale which is the reciprocal of the rate λ\lambda. See for example this Gamma distribution app.

We finished today by talking about inverse CDFs. They convert percentiles into the value (or location) of the random variable at that percentile.

  1. If XX is a random variable with CDF F(x)=1ex2/2F(x) = 1 - e^{-x^2/2}, x>0x > 0, then what is the inverse CDF F1(p)F^{-1}(p)?

Week 8 Notes

Day Sections Topic
Mon, Oct 9 6.1 - 6.2 Measures of center & moments
Wed, Oct 11 6.3 Sampling moments
Fri, Oct 13 6.7 Probability generating functions

Monday, October 9

Today we talked about the mean and the median of a random variable. We started with the following example:

  1. The time a person has to wait for a bus to arrive in Blotchville is exponentially distributed with mean 10 minutes (so λ=0.1/minute). What is the median wait time? (In the homework, you’ll invert the CDF for the exponential distribution explicitly. But in class we just used the app).

  2. Why does it make sense that the median is less than the average wait time?

Then we introduced moments. We also defined central moments and standardized moments. The 3rd standardized moment is called the skewness and the 4th is called the kurtosis (our book uses a slightly different definition of kurtosis, but I’ll stick with the simpler definition).

We talked about how skewness measures how asymmetric a distribution is, which led to a discussion of symmetric distributions.

Wednesday, October 11

We started by talking about how the first moment (the expected value) corresponds to the center of mass and the second central moment (the variance) corresponds to the moment of inertia (it’s proportional to the kinectic energy of rotation if you spun the PMF/PDF around its center of mass).

After that we talked about sample moments Mk=j=1nXjkM_k = \sum_{j = 1}^n X_j^k when X1,X2,,XnX_1, X_2, \ldots, X_n is any i.i.d. sample of nn random variables.

Definition. A random variable YY is an unbiased estimator for a parameter θ\theta if E(Y)=θE(Y) = \theta.

  1. Show that the sample mean X=X1+X2++Xnn\bar{X} = \frac{X_1 + X_2 + \ldots + X_n}{n} is an unbiased estimator for the population mean μ\mu.

  2. Show that the kk-th sample moment MkM_k is an unbiased estimator for the kk-th moment for any probability distribution.

Then we talked about the problem of how to find a useful unbiased estimator for the variance σ2\sigma^2. The best unbiased estimator ends up being the sample variance s2=j=1n(XjX)2n1.s^2 = \frac{\sum_{j = 1}^n (X_j - \bar{X})^2}{n-1}. Why do we divide by n1n-1? We started but didn’t finish the explanation in class. First we defined two random vectors which are vectors with random variable entries:

V=[X1XX2XXnX],W=[X1μX2μXnμ] V = \begin{bmatrix} X_1 - \bar{X} \\ X_2 - \bar{X} \\ \vdots \\ X_n - \bar{X} \end{bmatrix}, ~ W = \begin{bmatrix} X_1 - \mu \\ X_2 - \mu \\ \vdots \\ X_n - \mu \end{bmatrix}

Recall that the length of a vector u=[u1un]u = \begin{bmatrix} u_1 \\ \vdots \\ u_n \end{bmatrix} is u=u12++un2\|u\| = \sqrt{u_1^2 + \ldots + u_n^2}.

  1. Show that W2=nσ2\|W\|^2 = n \sigma^2 where σ2\sigma^2 is the variance of each of the RVs XjX_j.

  2. Show that VV is orthogonal to VWV-W, i.e., show that the dot product V(VW)=0V \cdot (V-W) = 0.

We didn’t finish, but we ended with this last question:

  1. Find the variance of X\bar{X}.

With all of those pieces, you can calculate the expected value E(V2)=E(W2)E(WV2)E(\|V\|^2) = E(\|W\|^2) - E(\|W-V\|^2).

Friday, October 13

Today we started by reviewing moments and the Law of the Unconscious Statistician (LOTUS). I gave out this handout about random variables.

We also finished the calculation from last time to show that j=1n(XjX)2n1\frac{\sum_{j = 1}^n (X_j - \bar{X})^2}{n-1} is an unbiased estimator for the variance.

Then we started a discussion of probability generating functions (PGFs) which are covered in Section 6.7 of the book. We looked at these examples:

  1. The PGF for a six-sided die is: f(t)=16t+16t2+16t3+16t4+16t5+16t6.f(t) = \tfrac{1}{6}t +\tfrac{1}{6} t^2 +\tfrac{1}{6} t^3 +\tfrac{1}{6} t^4 +\tfrac{1}{6} t^5 +\tfrac{1}{6} t^6.

In general the PGF is a function with a variable tt such that coefficients on each term are the probabilities and the powers are the outcomes in a probability model for a discrete random variable.

Theorem. If XX and YY are independent discrete RVs with PGFs fX(t)f_X(t) and fY(t)f_Y(t), then X+YX+Y has PGF fX(t)fY(t).f_X(t) \cdot f_Y(t).

  1. Find the PGF for rolling 6 six-sided dice, and use it to find probability that the total is 20.

  2. Find the PGF for flipping a coin once. What about 10 times?

Week 9 Notes

Day Sections Topic
Wed, Oct 18 6.7 Probability generating functions - con’d
Fri, Oct 20 6.4 - 6.5 Moment generating functions

Wednesday, October 18

Today we discussed probability generating functions in more detail. We also talked about using generating functions for counting too. We did this in-class workshop:

For people who finished the workshop early, I suggested this extra problem:

Friday, October 20

Today we introduced moment generating functions (MGFs). We calculated several examples:

  1. Find the MGFs for a Bernouilli(p) random variable and for a six-sided die.

  2. Find the MGF for a Unif(a,b) random variable.

  3. Find the MGF for XExp(1)X \sim \operatorname{Exp}(1).

Then we talked about these important theorems:

Theorem. (MGFs completely determine the distribution) If two R.V.s have the same MGF on an open interval around 0, then they have the same probability distributions.

Theorem. (MGFs for sums of independent R.V.s) If X,YX, Y are independent R.V.s with MGFs mX(t)m_X(t) and mY(t)m_Y(t), then X+YX+Y has MGF mX(t)mY(t)m_X(t) \cdot m_Y(t).

We didn’t prove these theorems, but we did talk a little about why the second one is true.

  1. If X,YExp(1)X, Y \sim \operatorname{Exp}(1) are i.i.d., then X+YGamma(2,1)X+Y \sim \operatorname{Gamma}(2,1). Find the MGF for X+YX+Y.

Finally we explained why they are called moment generating functions by proving:

Theorem. If XX is a R.V. with MGF mX(t)m_X(t), then the k-th moment of XX is the k-th derivative of mX(t)m_X(t) at t=0t=0: E(Xk)=mX(k)(0).E(X^k) = m_X^{(k)}(0).

We also pointed out that not every R.V. has a moment generating function, because the integral (or sum) that defines the MGF might not converge when t0t \ne 0.

Week 10 Notes

Day Sections Topic
Mon, Oct 23 6.6 Sums of independent r.v.s.
Wed, Oct 25 Review
Fri, Oct 27 no class

Monday, October 23

Today we started with two exercises:

  1. Let XPois(λ)X \sim \operatorname{Pois}(\lambda), so the PMF for XX is: eλλk/k!e^{-\lambda} \lambda^k/ k!. Find the MGF MX(t)M_X(t).

  2. Let ZNorm(0,1)Z \sim \operatorname{Norm}(0,1). Find the MGF for ZZ. Hint: You’ll have to complete the square: txx2/2=x22tx2=(xt)22+t22.tx-x^2/2 = -\frac{x^2 - 2tx}{2} = -\frac{(x - t)^2}{2} + \frac{t^2}{2}.

  3. If XX is any random variable with MGF MX(t)M_X(t), what is the MGF for Y=aX+bY = aX + b? Hint: use the definition.

  4. Use your answer to the previous problem to find the MGF for μ+σZ\mu + \sigma Z.

After we did these two exercises, we looked at this Table of Moment Generating Functions. Then we proved these two theorems:

Theorem 1. Let XPois(λ)X \sim \operatorname{Pois}(\lambda) and YPois(μ)Y \sim \operatorname{Pois}(\mu) be independent. Then X+YPois(λ+μ)X+Y \sim \operatorname{Pois}(\lambda + \mu).

Theorem 2. Let XNorm(μX,σX)X \sim \operatorname{Norm}(\mu_X, \sigma_X) and YNorm(μY,σY)Y \sim \operatorname{Norm}(\mu_Y, \sigma_Y) be independent. Then X+YNorm(μX+μY,σX2+σY2).X+Y \sim \operatorname{Norm}(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2}).

Wednesday, October 25

Today we looked at some examples similar to what might be on the midterm on Monday.

  1. Let XX be a RV with PDF f(x)=xexdxf(x) = x e^{-x} \, dx with x>0x > 0.
    1. Find the CDF for XX.
    2. Find P(X>2)P(X > 2).
  2. If XUnif(1,1)X \sim \operatorname{Unif}(-1,1), then XX has MFG mX(t)=etet2tm_X(t) = \dfrac{e^t - e^{-t}}{2t}.
    1. Find a Maclaurin series for mX(t)m_X(t).
    2. Use the Maclaurin series to find the first four moments of XX.
  3. Let XBernoulli(p)X \sim \operatorname{Bernoulli}(p).
    1. Find the MGF for XX.
    2. Find a Maclaurin series for the MGF.
    3. Use the Macularin series to find E(X)E(X) and E(X2)E(X^2).
    4. Find the variance of XX.

Week 11 Notes

Day Sections Topic
Mon, Oct 30 Midterm 2
Wed, Nov 1 7.1 Joint distributions
Fri, Nov 3 7.1 Marginal & conditional distributions

Wednesday, November 1

Today we introduced joint distributions. First we defined joint PMFs for discrete random variables:

Definition. A function f(x,y)f(x,y) is a joint probability mass function for two discrete r.v.s XX and YY if P(X=x and Y=y)=f(x,y)P(X=x \text{ and } Y = y) = f(x,y) for all pairs (x,y)(x,y) in the support of XX and YY.

We briefly looked at example 7.1.5 from the book before moving on to joint PDFs for continuous random variables:

Definition. A function f(x,y)f(x,y) is a joint probability density function for two continuous r.v.s XX and YY if P(aXb and cYd)=cdabf(x,y)dxdy.P(a \le X \le b \text{ and } c \le Y \le d) = \int_c^d \int_a^b f(x,y) \, dx dy.

We did the example of a uniform distribution on a circle, then we did this workshop:

If you need to review double integrals, I recommend trying these videos & examples on Khan Academy.

Friday, November 3

Today we talked about marginal and conditional distributions for jointly distributed random variables. If X,YX, Y are jointly distributed, then

The book has a good picture to understand how the shape of the conditional PDF comes from the shape of the joint PDF after you renormalize Figure 7.5 on page 325.

We did these examples.

  1. Let XX and YY be the coordinates of a point chosen randomly and uniformly from inside the unit circle. Find
    1. P(X2+Y2)<19P(X^2 + Y^2) < \tfrac{1}{9}
    2. The conditional PDF fY|X(y|X=12)f_{Y\,|\,X}(y\,|\,X=\tfrac{1}{2})
    3. The marginal PDF fX(x)f_X(x).
  2. Roll two 4-sided dice. Let XX be the total and let YY be the maximum of the two dice.
    1. Find the marginal PMFs fX(x)f_X(x) and fY(y)f_Y(y).
    2. Find the conditional PMF fX|Y(x|Y=4)f_{X\,|\,Y}(x \,|\, Y = 4).

Week 12 Notes

Day Sections Topic
Mon, Nov 6 7.2 2D Lotus
Wed, Nov 8 7.3 Covariance & correlation
Fri, Nov 10 7.5 Multivariate normal distribution

Monday, November 6

Today we introduced the 2-dimension version of the Law of the Unconscious Statistician (2D LOTUS). For random variables X,YX, Y with joint PDF or PMF f(x,y)f(x,y), E(g(X,Y))=g(x,y)f(x,y)dxdy or xyg(x,y)f(x,y).E(g(X,Y)) = \int_{-\infty}^\infty \int_{-\infty}^\infty g(x,y) f(x,y) \, dx \, dy \text{ or } \sum_x \sum_y g(x,y) f(x,y).

  1. Suppose X,YUnif(0,1)X, Y \sim \operatorname{Unif}(0,1) are i.i.d. RVs. Find E(|xy|)E(|x-y|).

  2. Let X,YX, Y be any independent random variables with joint distribution f(x,y)f(x,y). Then prove that E(XY)=E(X)E(Y).E(XY) = E(X) E(Y).

After these examples, we defined the covariance of two random variables: Cov(X,Y)=E((XμX)(YμY)).\operatorname{Cov}(X,Y) = E((X-\mu_X)(Y-\mu_Y)).

  1. Show that Cov(X,Y)=E(XY)E(X)E(Y)\operatorname{Cov}(X,Y) = E(XY) - E(X)E(Y).

  2. Show that Cov(X,Y)=0\operatorname{Cov}(X,Y) = 0 if X,YX, Y are independent.

We also discussed the following additional properties of covariance:

  1. Symmetry Cov(X,Y)=Cov(Y,X)\operatorname{Cov}(X,Y) = \operatorname{Cov}(Y,X)
  2. Linearity Cov(aX+bY,Z)=aCov(X,Z)+bCov(Y,Z)\operatorname{Cov}(aX+bY,Z) = a \operatorname{Cov}(X,Z) + b \operatorname{Cov}(Y,Z)
  3. Relation with Variance Var(X)=Cov(X,X)\operatorname{Var}(X) = \operatorname{Cov}(X,X)
  4. Constants Cov(X,c)=0\operatorname{Cov}(X,c) = 0.

Finally, we defined the correlation between two random variables: ρ(X,Y)=Cov(X,Y)σXσY.\rho(X,Y) = \frac{\operatorname{Cov}(X,Y)}{\sigma_X \sigma_Y}.

Wednesday, November 8

Today we introduced multivariate normal distributions. Suppose that X1,,Xni.i.d.Norm(0,1)X_1, \ldots, X_n \overset{i.i.d.}{\sim} \operatorname{Norm}(0,1). We can arrange the values of X1,XnX_1, \ldots X_n into a vector X=[X1,,Xn]TX = [X_1, \ldots, X_n]^T in n\mathbb{R}^n which we call a random vector. Then for any mm-by-nn matrix Am×nA \in \mathbb{R}^{m \times n} and any vector bmb \in \mathbb{R}^m, the random vector Y=AX+bY = AX + b has a multivariate normal distribution.

  1. On the midterm we had an example where rainfall during the wet season was W(50,12)W \sim \operatorname(50,12) while the rainfall in the dry season is D(10,5)D \sim \operatorname(10,5). We could express this information using random vectors as: [WD]=[12005][X1X2]+[5010]\begin{bmatrix} W \\ D \end{bmatrix} = \begin{bmatrix} 12 & 0 \\ 0 & 5 \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} + \begin{bmatrix} 50 \\ 10 \end{bmatrix} where X1,X2i.i.d.Norm(0,1)X_1, X_2 \overset{i.i.d.}{\sim} \operatorname{Norm}(0,1), that is X=[X1,X2]TX = [X_1, X_2]^T is a standard normal random vector. Suppose we wanted to record just the total rainfall T=W+DT = W+D and the wet season rainfall WW. Then we could apply a linear transformation like so: [TW]=[1110][WD]=[1110]([12005][X1X2]+[5010])\begin{bmatrix} T \\ W \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} W \\ D \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \left( \begin{bmatrix} 12 & 0 \\ 0 & 5 \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} + \begin{bmatrix} 50 \\ 10 \end{bmatrix} \right) so [TW]=[125120][X1X2]+[6050].\begin{bmatrix} T \\ W \end{bmatrix} = \begin{bmatrix} 12 & 5 \\ 12 & 0 \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} + \begin{bmatrix} 60 \\ 50 \end{bmatrix}.

For any multivariate normal random vector Y=AX+bY = AX + b where XX is a standard normal random vector, the covariance matrix for YY is Σ=AAT.\Sigma = AA^T. The entries of the covariance matrix are Σ=[Cov(Y1,Y1)Cov(Y1,Y2)Cov(Y1,Ym)Cov(Y2,Y1)Cov(Ym,Y1)Cov(Ym,Ym)].\Sigma = \begin{bmatrix} \operatorname{Cov}(Y_1,Y_1) & \operatorname{Cov}(Y_1, Y_2) & \ldots & \operatorname{Cov}(Y_1, Y_m) \\ \operatorname{Cov}(Y_2,Y_1) & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ \operatorname{Cov}(Y_m,Y_1) & \ldots & \ldots & \operatorname{Cov}(Y_m, Y_m) \end{bmatrix}.

  1. The heights of fathers and their adult sons have a moderately strong correlation ρ=0.5\rho = 0.5. Both father’s and son’s heights (measured in standard deviations from the mean) have a standard normal distribution Norm(0,1)\operatorname{Norm}(0,1). But they are not indepedent. Instead, find the covariance matrix for this situation.

  2. You can think of the height of the son YY as a linear combination of his father’s height XX and an additional independent random component ZZ that has a standard normal distribution. Find coefficients for aX+bZaX + bZ such that the resulting normal distribution has standard deviation 11 and Cov(aX+bZ,X)=0.5\operatorname{Cov}(aX+bZ,X) = 0.5.

  3. Compute AAAA' where A=[10ab].A = \begin{bmatrix} 1 & 0 \\ a & b \end{bmatrix}. Do you get the correct covariance matrix for the heights of fathers and sons?

Friday, November 10

Today we started with some examples that applied the following theorem about multivariate normal distributions.

Theorem. If X=[X1Xn]X = \begin{bmatrix} X_1 \\ \vdots \\ X_n\end{bmatrix} is a random vector with a multivariate normal distribution, and if Am×nA \in \mathbb{R}^{m \times n} is a matrix, then Y=AXY = AX has a multivariate normal distribution. In the special case where AA has only one row (i.e., m=1m=1), YY has a normal distribution.

We used this theorem to help solve the following problems.

  1. Last time we looked at the height of a random father FF and his adult son SS, both of which have a Norm(70,3)\operatorname{Norm}(70,3) distribution, but with correlation ρ=0.5\rho = 0.5. Together the vector [FS]\begin{bmatrix} F \\ S \end{bmatrix} is MVN. Consider how much taller a son might be than the father. What is the distribution of SFS-F?
    1. What is E(SF)E(S-F)?
    2. What is Var(SF)\operatorname{Var}(S-F)?
    3. Find P(SF1)P(S-F \ge 1).
  2. Suppose one region gets WNorm(50,12)W \sim \operatorname{Norm}(50,12) inches of rain during the wet season and DNorm(10,5)D \sim \operatorname{Norm}(10,5) inches of rain in the dry season. Suppose that the total rain W+DW+D has a normal distribution and D,WD, W have a correlation ρ=0.225\rho = 0.225.
    1. Find E(W+D)E(W+D).
    2. Find Var(W+D)\operatorname{Var}(W+D).
    3. Find P(W+D80)P(W+D \ge 80).

After that, we introduced the log-normal distribution. This is the distribution of Y=eXY = e^X if XNorm(0,1)X \sim \operatorname{Norm}(0,1).

  1. Find P(Yy)P(Y \le y) using Φ\Phi to represent the standard normal CDF (which doesn’t have a nice formula).

  2. Differentiate the CDF for YY to find the PDF for YY.

  3. Find a formula for the moments E(Yn)E(Y^n). Hint: E(Yn)=E(enX)E(Y^n) = E(e^{nX}) which looks a lot like the MGF mX(t)=E(etX)m_X(t) = E(e^{tX}). So you can use the MGF for a standard normal XX to find the moments of YY.

Week 13 Notes

Day Sections Topic
Mon, Nov 13 8.1 Change of variables
Wed, Nov 15 8.2 Convolutions
Fri, Nov 17 Review
Mon, Nov 20 Midterm 3

Monday, November 13

Today we did more examples of change of variables for random variables. We focused on one dimensional examples.

  1. Let UUnif(0,1)U \sim \operatorname{Unif}(0,1). Find PDF for U2U^2.

  2. Here is how you can generate an exponentially distributed r.v. XX. Start by randomly generating UUnif(0,1)U \sim \operatorname{Unif}(0,1). Then apply the function f(x)=lnxf(x) = -\ln x to UU. Prove that X=lnUX = -\ln U has the Exp(1)\operatorname{Exp}(1) distribution.

  3. Let XX be any r.v. For any monotone (i.e., either always increasing or always decreasing) differentiable function gg defined on the support of XX, let Y=g(X)Y = g(X). Prove that the PDF of YY is fY(y)=fX(x)|dxdy|f_Y(y) = f_X(x) \left| \dfrac{dx}{dy} \right|.

  4. Find the PDF for X3X^3 where XNorm(0,1)X \sim \operatorname{Norm}(0,1).

We also defined the χ2\chi^2-distribution, which is the distribution of a sum of i.i.d. random variables X12++Xn2X_1^2 + \ldots + X_n^2 where each XiNorm(0,1)X_i \sim \operatorname{Norm}(0,1). When n=1n = 1, we were able to find the PDF for the χ2(1)\chi^2(1) distribution even though the function g(x)=x2g(x) = x^2 is not monotone increasing on the whole real line.

Wednesday, November 15

Today we defined the convolution of two functions (fX*fY)(t)=fY(tx)fX(x)dx=fX(ty)fY(y)dy.(f_X \ast f_Y)(t) = \int_{-\infty}^\infty f_Y(t-x) f_X(x) \, dx = \int_{-\infty}^\infty f_X(t-y) f_Y(y) \, dy. If X,YX, Y are independent continuous random variables with PDFs fXf_X and fYf_Y, then X+YX+Y has PDF fX*fYf_X \ast f_Y. You can also define the discrete convolution, which gives the PMF for a sum of two independent discrete random variables. (fX*fY)(k)=xfY(kx)fX(x)=yfX(ky)fY(y)(f_X \ast f_Y)(k) = \sum_{x} f_Y(k-x) f_X(x) = \sum_y f_X(k-y) f_Y(y)

  1. Find the PMF for the sum of two 6-sided dice.

  2. If X,Yi.i.d.Unif(0,1)X, Y \overset{i.i.d.}{\sim} \operatorname{Unif}(0,1), find the PDF for X+YX+Y. (https://youtu.be/Blg5RIjGwBE)

To help with the notation in this last problem, we introduced indicator functions 1A\mathbf{1}_A. (See https://en.wikipedia.org/wiki/Indicator_function)

  1. Find the PDF for the sum of X,Yi.i.d.Exp(λ)X, Y \overset{i.i.d.}{\sim} \operatorname{Exp}(\lambda). (https://youtu.be/Glff9dvPVEg)

  2. Last time, we found the distribution for X2X^2 if XNorm(0,1)X \sim \operatorname{Norm}(0,1). The PDF for X2X^2 is fX2(x)=12πxex/2.f_{X^2}(x) = \frac{1}{\sqrt{2 \pi x}} e^{-x/2}. This is the χ2(1)\chi^2(1) PDF. Now suppose that we have X,Yi.i.d.Norm(0,1)X, Y \overset{i.i.d.}{\sim} \operatorname{Norm}(0,1) random variables. Set up, but don’t evaluate a convolution integral for the PDF of X2+Y2X^2 + Y^2.

Friday, November 17

Today we did a review of material that will be on midterm 3. This includes:

We did these examples in class:

  1. Suppose X,YX, Y are discrete r.v.s. each taking values in {0,1}\{0,1\} with joint PMF f(x,y)f(x,y) given by f(0,0)=12,f(0,1)=13,f(1,0)=16,f(1,1)=0.f(0,0) = \tfrac{1}{2}, ~f(0,1) = \tfrac{1}{3}, ~f(1,0) = \tfrac{1}{6}, ~f(1,1) = 0.

    1. Find the PMFs for XX, YY, and X+YX+Y
    2. Find the conditional PMF for XX if Y=0Y=0.
    3. Find E(XY)E(XY)
    4. Find Cov(X,Y)\operatorname{Cov}(X,Y)
  2. Suppose that (X,Y)(X,Y) are continuous r.v.s. that are jointly uniformly distributed in the unit disk D={(x,y)2:x2+y21}.D = \{(x,y) \in \mathbb{R}^2 \, : \, x^2+y^2 \le 1\}.

    1. What is the joint PDF for XX and YY?
    2. Write down, but don’t evaluate, an integral to find the expected distance from (X,Y)(X,Y) to the point (0,1)(0,1) at the top of the unit disk.
  3. Suppose X,YX,Y are both Norm(0,1)\operatorname{Norm}(0,1) r.v.s., and the correlation between XX and YY is ρ=0.5\rho = 0.5. Find P(X+2Y3)P(X+2Y \ge 3).

  4. Let XUnif(0,2)X \sim \operatorname{Unif}(0,2) and YUnif(0,1)Y \sim \operatorname{Unif}(0,1) be independent r.v.s. Find the PDF for X+YX+Y.

Week 14 Notes

Day Sections Topic
Mon, Nov 27 10.1 - 10.2 Inequalities & Law of large numbers
Wed, Nov 29 10.3 Central limit theorem
Fri, Dec 1 10.3 Applications of LLN & CLT
Mon, Dec 4 Review & recap

Monday, November 27

Today we introduced two important inequalities: the Markov Inequality and Chebyshev’s Inequality.

Theorem (Markov’s Inequality). For any r.v. XX and constant a>0a > 0, P(|X|a)E(|X|)a.P(|X| \ge a) \le \frac{E(|X|)}{a}.

We gave a visual proof by looking at the graph of the PDF for XX (assuming XX is continuous, but the proof is essentially the same for discrete r.v.s.) and comparing the integrals to find P(|X|a)P(|X| \ge a) with the one to find E(1a|X|)E(\tfrac{1}{a} |X|). Which integral is bigger?

An corollary of Markov’s inequality is this more useful inequality.

Theorem (Chebyshev’s Inequality). Let XX be any r.v. with mean μ\mu and variance σ2\sigma^2. Let a>0a > 0. Then P(|Xμ|a)σ2a2.P(|X - \mu | \ge a) \le \frac{\sigma^2}{a^2}.

  1. Prove Chebyshev’s inequality by applying Markov’s inequality to the random variable (Xμ)2(X-\mu)^2.

  2. Here on Earth, heights of adults are roughly normally distributed. But if you go to another planet, they might have a totally different probability distribution. Explain how we can be certain that less than 25% of Martians have a height that is 2 standard deviations above average.

  3. If X1,,XNX_1, \ldots, X_N are i.i.d. r.v.s., with mean μ\mu and standard deviation σ\sigma, then what is the mean and standard deviation of X=X1++Xnn?\bar{X} = \frac{X_1 + \ldots + X_n}{n}?

  4. What does Chebyshev’s inequality say about the probability P(|Xμ|a)P(|\bar{X} - \mu| \ge a)? What happens as nn gets bigger?

Wednesday, November 29

Today we talked about the central limit theorem.

Central Limit Theorem. Let X1,X2,,XnX_1, X_2, \ldots, X_n be i.i.d. r.v.s. with mean μ\mu and variance σ2\sigma^2. Then the PMF or PDF for the random variable Z=Xμσ/nZ = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} converges to the PDF for a Norm(0,1)\operatorname{Norm}(0,1) random variable as nn \rightarrow \infty.

We proved this theorem in class under the extra assumption that the MGF for XiX_i exists. We also assumed for simplicity that μ=0\mu = 0 and σ=1\sigma=1. Neither assumption is necessary (and it is easy to get rid of the assumption that μ=0\mu = 0 and σ=1\sigma=1).

  1. Let M(t)M(t) denote the MGF for each of the XiX_i. How do we know that they all have the same MGF?

  2. Show that the MGF for Z=X/nZ = \bar{X}/\sqrt{n} is M(t/n)nM(t/\sqrt{n})^n.

  3. What are M(0)M(0), M(0)M'(0), and M(0)M''(0)?

  4. Find limnM(t/n)n\lim_{n \rightarrow \infty} M(t/\sqrt{n})^n.

We finished with this exercise:

  1. If you roll 10 six-sided dice, estimate the probability that the total is at least 50.

The key to the last problem is to use these normal approximation facts:

Corollary (Normal Approximations). If X1,,XnX_1, \ldots, X_n are i.i.d. RVs with mean μ\mu and variance σ2\sigma^2 and nn is large, then

  1. The total X1+X2++XnX_1 + X_2 + \ldots + X_n is approximately Norm(nμ,nσ)\operatorname{Norm}(n \mu, \sqrt{n} \sigma).

  2. The sample mean X\bar{X} is approximately Norm(μ,σn)\operatorname{Norm}(\mu, \tfrac{\sigma}{\sqrt{n}}).

Friday, December 1

Today we talked about applications of the Central Limit Theorem and the Law of Large Numbers. We started with this corollary of the Central Limit Theorem that we didn’t write down explicitly in class last time:

Corollary (Normal Approximations). If X1,,XnX_1, \ldots, X_n are i.i.d. RVs with mean μ\mu and variance σ2\sigma^2 and nn is large, then

  1. The total X1+X2++XnX_1 + X_2 + \ldots + X_n is approximately Norm(nμ,nσ)\operatorname{Norm}(n \mu, \sqrt{n} \sigma).

  2. The sample mean X\bar{X} is approximately Norm(μ,σn)\operatorname{Norm}(\mu, \tfrac{\sigma}{\sqrt{n}}).

  1. Adults in the USA have a mean weight of 170 lbs. with a standard deviation of 40 lbs. If a random sample of 100 adult passengers boards an airplane, what is the probability that their total weight exceeds 18,000 lbs?

We also used the corollary to derive the formula for a 95% confidence interval:

95% Confidence Intervals In a large sample, X\bar{X} is within 2 standard deviations of μ\mu, 95% of the time. So if you know (or have a good estimate) for σ\sigma, then you can use X\bar{X} to estimate μ\mu:
x±2σn.\bar{x} \pm 2 \frac{\sigma}{\sqrt{n}}.

We finished by introducing Monte Carlo Integration.

  1. Find 011x2dx\int_0^1 \sqrt{1-x^2} \, dx using Monte Carlo integration. Write a computer program to randomly generate points uniformly in the square [0,1]×[0,1][0,1] \times [0,1], then record 1 if the point is under the curve, or 0 if it is not.

  2. When you randomly generate points in a rectangle an calculate the proportion that hit the region you want, what is the approximate probability distribution for the proportion that hit? What is its mean and standard deviation?

Monday, December 4

Today we did a review of some material that might be on the final. We did the following problems in class:

  1. Suppose 10 men and 10 women get in a line in a random order. Find the probability that the 10 men are in front of the 10 women in the line.

  2. Let XX be a random variable that is partially determined by flipping a coin. If the coin is heads, then XExp(1)X \sim \operatorname{Exp}(1) and if the coin is tails, then XUnif(0,1)X \sim \operatorname{Unif}(0,1). Find P(head|X0.9)P( \text{head} \,|\, X \ge 0.9).

  3. Let XNorm(0,1)X \sim \operatorname{Norm}(0,1) and YExp(1)Y \sim \operatorname{Exp}(1) be independent random variables. Set up, but do not evaluate an integral that represents P(YX)P(Y \ge X).

We also reviewed two problems from midterm 3. The problem about finding the standard deviation of a difference of two correlated normal random variables and the problem of finding the expected value of a function for a pair of random variables that are jointly uniformly on a triangle.