Probability Notes

Math 421 - Fall 2023

Jump to week:

Week 1 Notes

Day	Sections	Topic
Mon, Aug 21	1.1 - 1.4	Counting
Wed, Aug 2	1.5 - 1.6	Story proofs & definition of probability
Fri, Aug 25	1.5	Simulation with Python

Monday, August 21

Today we review the basic rules for counting.

Handout: Counting rules

We looked at how these rules can help us solve probability problems using the naive definition of probability.

What is the probability that a hand of 5 cards from a shuffled deck of 52 playing cards is a full house (3 of one rank and 2 of another)?
What is the probability that a group of 30 random people will all have different birthdays?
How many different ways are there to rearrange the letters of MISSISSIPPI?
What if you randomly permuted the letters of MISSISSIPPI? What is the probability that they would still spell MISSISSIPPI?

The functions $_n C_k$ and $_n P_k$ are both in the Python standard math library. You can use the following code to calculate them:

import math

n, k = 5, 3
print(math.comb(n,k))
print(math.perm(n,k))

Wednesday, August 23

Today we introduced story proofs with these examples:

Explain why it makes sense that $\binom{n}{k} = \binom{n}{n-k}$ in terms of choosing subsets of a set with $n$ elements.
Prove that $\displaystyle n\binom{n-1}{k-1} = k \binom{n}{k}$ . Hint: there are the same number of ways to choose a team captain who then picks the rest of the team as there are ways to pick the team first, and then select the captain from within the team.
Prove Vandermonde’s identity $\binom{m+n}{k} = \sum_{j = 0}^k \binom{m}{j} \binom{n}{k-j}.$

After that, we introduced the general definition of probability:

Definition. A probability space consists of a set $S$ of possible outcomes called the sample space, and a probability function $P:2^S \rightarrow [0,1]$ which satisfies two axioms:

Axiom 1. $P(\varnothing) = 0$ and $P(S) = 1$ , and
Axiom 2. For any finite or countably infinite collection of disjoint events $A_1, A_2, \ldots$ , $P(\bigcup_{i = 1}^\infty A_i) = \sum_{i = 1}^\infty P(A_i).$

We looked at the following examples of probability spaces:

Describe the sample space and probability function for rolling a fair six-sided die.
Suppose you flip a fair coin repeatedly until it lands on heads. The sample space is the number of flips it takes. Describe the sample space, the probabilities of the individual outcomes in the sample space, and then calculate the probability that you get an odd number of flips. (We needed geometric series to answer this question!).
What is the probability it takes an even number of flips?

We finished by proving the Complementary events formula $P(A^C) = 1-P(A)$ .

Friday, August 25

Today we did probability simulations in Python using Google Colab.

Estimate the probability that the total of three six-sided dice is 12 by simulating rolling 3 dice many times.
Chevalier de Mere’s problem. The Chevalier de Mere was a French gambler in the 1600’s. He knew that the chance of rolling a one on a six-sided die is 1/6 and the chance of rolling two dice and getting both die to land on ones is 1/36. So he reasoned that the probability of getting a one in four rolls of a single die should be the same as getting a pair of ones in 24 rolls of two dice. Write a simulation in Python to see which is more likely to happen.

Here are my sample solutions in Google Colab.

Week 2 Notes

Day	Sections	Topic
Mon, Aug 28	2.1 - 2.3	Bayes rule
Wed, Aug 30	2.4 - 2.5	Independence
Fri, Sep 1	2.6 - 2.7	Conditioning

Monday, August 28

Today we talked about conditional probability: $P(A | B) = \frac{P( A \cap B)}{P(B)}.$

We did these examples:

Shuffle a deck of cards and draw two cards off the top. Find $P(\text{2nd is an Ace} ~|~ \text{1st is an Ace} )$ .
Women in their 40’s have an 0.8% chance to have breast cancer. Mammograms are 90% accurate at detecting breast cancer for people who have it. They are also 93% accurate at not detecting breast cancer when someone doesn’t have it.

Find $P(\text{Test Positive} ~|~\text{Have cancer})$ .
Find $P(\text{Test Positive} ~|~\text{Don't have cancer})$ .
Find $P(\text{Test positive})$ .
Find $P(\text{Have breast cancer} ~|~\text{Tested positive})$ .

We also talked about how to draw weighted tree diagrams to keep track of the probabilities in an example like this. And we also reviewed some of the Python we used on Monday. See this Google colab example.

Wednesday, August 30

Today we talked about Bayes formula, both the standard version and the version for calculating posterior odds based on prior odds and the likelyhood ratio. We did these examples:

5% of men are color blind, but only 0.25% of women are. Find the posterior odds that someone is male given that they are color blind.
What is the likelihood ratio for having breast cancer if someone has a positive mammogram test?
(Problem from OpenIntroStats.) Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.

We also looked at other examples of conditional probabilities in two-way tables:

Example: Conditional probabilities in two-way tables

We finished by defining independent events and proving that if $A, B$ are independent, then so are $A$ and $B^c$ .

Friday, September 1

Today we talked about how to use conditioning to solve probability problems. We did two types of examples: ones where you condition on all of the possible outcomes of a family of events that partition the sample space (using the Law of Total Probability) and the other where you condition on the outcome of a single event (using the conditional probability definition or Bayes formula). We did these examples:

In the example from Wednesday where Jose is trying to find parking on campus, find $P(\text{garage is full})$ .
If I have a bag with five dice, one four-sided, one six-sided, one eight-sided, one twelve-sided, and one twenty-sided, and I randomly select one die from the bag and roll it, what is $P(\text{result} = 5)$ ?
What if you didn’t know which die I rolled, but you knew the result was 5? Find $P(\text{die is twenty-sided}|\text{result}=5)$ .
Prove that for any two events $A$ and $B$ , $P(A | A \cup B) \ge P(A|B)$ . Hint: Condition on whether or not $B$ occurs.
5% of men are color blind and 0.25% of women are. 25% of men are taller than 6 feet tall, while only 2% of women are. Find the conditional probability that a random person is male if you know they are both color blind and taller than 6 ft.
1. Notice that you have to assume that color blindness is independent of height for both men & women.
2. Notice also that color blindness and height definitely are not independent for all adults. Why not?
3. You can condition on both color blindness and heigh simultaneously or you can condition on just one, and then condition on the other. Both approaches will work and give the same answer:

$P(A | B \cap C) = \frac{P(A \cap B | C)}{P(B | C)} = \frac{P(A \cap B \cap C)}{P(B \cap C)}.$

Week 3 Notes

Day	Sections	Topic
Wed, Sep 6	2.8 - 2.9	Conditioning - con’d
Fri, Sep 8	3.1 - 3.2	Discrete random variables

Wednesday, Sep 6

Today we talked about counter-intuitive probability examples.

The Monte Hall problem.
The prosecutor’s fallacy. In 1998, Sally Clark was tried for murder after two of her sons died suddenly after birth. The prosecutor argued that the probability of one child dying of Sudden Infant Death syndrome (SIDs) is 1/8500, so $P(\text{two children died} | \text{ not guilty} ) = \frac{1}{8500} \cdot \frac{1}{8500} = \frac{1}{72,250,000}$ Given how small that number was, the prosecutor argued that this proved that Sally Clark was guilty beyond a reasonable doubt. What assumptions was the prosecutor making?
p-values in Statistics. We looked at an example where I used my psychic powers to guess 10 out of 25 Zener cards correctly. The conditional probability that I would get so many correct if I were just guessing is very low. The actual answer uses the binomial distribution which we will talk about next week. But is that strong evidence that I am psychic?

Friday, Sep 8

Today we introduced random variables. We defined discrete random variables and their probability mass functions (PMFs). We looked at these examples:

Flip two coins. Let X = the number of heads. What is the PMF for X?
Roll two six-sided dice and let T = the total. What is the PMF for T?
Flip a coin until it lands on heads. Let Z = the number of flips. What is the PMF for Z?

We also simulated the PMF of the random variable X = # of correct guesses on 25 tries with Zener cards.

Notebook: Simulation of PMF for Zener Card Guesses.

Week 4 Notes

Day	Sections	Topic
Mon, Sep 11	3.3 - 3.4	Bernoulli, Binomial, Hypergeometric
Wed, Sep 13	3.3 - 3.6	Hypergeometric, CDFs
Fri, Sep 15	3.7 - 3.8	Functions of random variables

Monday, September 11

Today we introduced the binomial distribution and the hypergeometric distribution. We derived the probability mass functions for both distributions and spent some time proving the PMF formula for the binomial distribution. We did the following in class exercises:

Graph the PMFs for Bin(2,0.5) and Bin(3,0.5).
One step in the proof of the binomial PMF formula was to give a story proof that $\binom{n}{k} = \binom{n-1}{k} + \binom{n-1}{k-1}.$
Suppose a large town is 47% Republican, 35% Democrat, and 18% independent. A political poll asks a sample of 100 residents their political affiliation. Let X be the number of Republicans. What is the probability distribution for X?
What is the probability of getting exactly 3 aces in a five card poker hand?
What is the distribution of the number of aces in a five card poker hand?

Wednesday, September 13

Today we introduced the cumulative distribution function (CDF) of a random variable. We used the binomial distribution CDF to calculate the following:

In roulette, if you bet on a number like 7, you have a 1/38 probability of winning. If you bet $1 and you win, then you get $36 dollars. If you play 100 games of roulette and bet $1 on 7 every time, what is the probability that you lose money?
What is the probability that someone who is just guess would get 10 or more Zener cards correct out of 25?

There is also a CDF function for the hypergeometric distribution, and we used it to find the following:

There is an urn with 10 balls, 7 red and 3 black. You take a random sample of 5 balls out of the urn. Let $X$ be the number of white balls in your sample. Find P(X≤4).

We also talked about how to get access to CDF functions in Python and R. Here is example of how to use Python with the scipy.stats module to work with binomial and hypergeometric random variables:

from scipy.stats import binom, hypergeom

X = binom(25,1/5)
print(1-X.cdf(9))

Y = hypergeom(10,7,5) 
print(Y.cdf(4))

Then we talked about functions of random variables.

Suppose a particle starts at position 0 on a number line, and then randomly moves left or right by 1 unit (with equal probabilities) every second. Express the location of the particle on the number line as a function of a binomial random variable.

You can also define functions of more than one random variable. For example:

If you roll two six-sided dice and let X be the result of the first roll and Y be the result of the second. Then f(X,Y) = X+Y is a function of two random variables. It is also a new R.V. in its own right. How would you find the PMF for this new R.V.?
For the random variables X and Y above, what is the difference between 2X and X+Y?

Friday, September 15

Today we finished chapter 3 in the textbook. We defined what it means for two or more random variables to be independent. We observed that if $X \sim \operatorname{Bin}(n,p)$ , then $X = X_1 + X_2 + \ldots X_n$ where $X_k \sim \operatorname{Bin(1,p)}$ are i.i.d (independent, identically distributed) Bernoulli random variables.

Use this idea to show that if $X \sim \operatorname{Bin}(n,p)$ and $Y \sim \operatorname{Bin}(m,p)$ are independent, then $X+Y \sim \operatorname{Bin}(m+n,p)$ .
A casino has 5 roulette wheels. One of the wheels is improperly balanced, so it lands on a 7 twice as often as it should ( $p = 1/19$ ). Bob likes to play roulette and he always bets on 7. Suppose that Bob plays 100 games of roulette at the casino. If he wins 5 games, find the probability that Bob was at the unbalanced wheel. What if he wins 10 times?

Week 5 Notes

Day	Sections	Topic
Mon, Sep 18		Review
Wed, Sep 20		Midterm 1
Fri, Sep 22	4.1 - 4.2	Expectation

Monday, September 18

Today we looked at the following questions:

Explain why the identity below makes sense (i.e., give a story proof). $\binom{n}{2} = \binom{k}{2} + k(n-k) + \binom{n-k}{2}, \text{ for all } 1 \le k \le n-1.$
Alice and Bob are both asked to pick their favorite three movies from the same list of 10 choices. Assume their choices are completely independent and are essentially random with each movie equally likely to be picked. Let $M$ be the number of movies that they both pick.

Is $M$ a sample space, an event, a probability, a random variable, or a probability distribution?
What is $P(M = 3)$ ?

Bob needs to get surgery on his knee. The doctor warns him that there is an 10% chance that the surgery will not fix the problem. There is also a 4% chance he could get an infection, and there is a 3% chance that he will both get an infection and the surgery will fail to fix the problem.

What is the probability that the surgery succeeds without infection?
Are the events that Bob gets an infection and the surgery fails independent?
If Bob gets an infection, what is the conditional probability that the surgery will fail to fix his knee?

A large company plans to start using a drug test to screen potential employees. Suppose the test is 99% accurate at detecting drug users, but has a 2% false positive rate for non-users. If 5% of job applicants use drugs, then

What percent of applicants will test positive?
Find P( uses drugs | tests positive).

Friday, September 22

Today we defined expected value for discrete random variables. We calculated the expected value for the following examples:

A six-sided die.
A single game of Roulette betting $1 on 7.

Then we used the linearity property of expectation to find this expected value for binomial and hypergeometric random variables. Then we introduced the geometric distribution. We derived the PMF for the geometric distribution and wrote down the definition of the expected value for a geometric random variable.

Week 6 Notes

Day	Sections	Topic
Mon, Sep 25	4.3 - 4.4	Geometric and negative binomial
Wed, Sep 27	4.5 - 4.6	Variance
Fri, Sep 29	4.7 - 4.8	Poisson distribution

Monday, September 25

Today I handed out the Discrete Probability Distributions cheat sheet in class. You don’t need to memorize these formulas, but make sure you are familiar with them and can recognize them when they come up in problems. We also did the following:

We derived the formula for the expected value of a geometric random variable.
McDonalds happy meals come with a toy. One month they give out Teenage Mutant Ninja Turtle toys. Each of the four turtles is equally likely. How many happy meals would you need to buy on average to get all four toys? Key idea: treat this as a sum of random variables $T_1$ , $T_2$ , $T_3$ , and $T_4$ which represent the meals needed to get the next turtle.
We found the expected value in The St. Petersburg Paradox.
We introduced the negative binomial distribution and derived the formula for its expected value.
We also gave a brief introduction to variance and standard deviation and we found the variance of rolling a six-sided die.

Wednesday, September 27

Today we talked more about variance. We talked about how to use the Law of the Unconscious Statistician (LOTUS)to find variance. We also proved this alternate formula for variance: $\operatorname{Var}(X) = E(X^2) - E(X)^2$ We also looked at the properties of variance. If $X, Y$ are independent random variables and $c$ is any constant, then

$\operatorname{Var}(c) = 0$
$\operatorname{Var}(X+Y) = \operatorname{Var}(X) + \operatorname{Var}(Y)$
$\operatorname{Var}(cX) = c^2 \operatorname{Var}(X)$

We used these ideas to do the following exercises.

Flip a coin that lands on heads with probability $p$ and let $X$ be 1 if you get heads and 0 otherwise. Find $\operatorname{Var}(X)$ .
Find the variance and standard deviation of $Y \sim \operatorname{Binom}(100,0.5)$ .
Let $X$ and $Y$ be the results from rolling two different fair six-sided dice. Find $\operatorname{Var}(X+Y)$ and compare it with $\operatorname{Var}(X+X)$ . Which random variable has a larger variance $X+Y$ or $X+X$ ? Why does the answer make sense?
The variance for a $\operatorname{Geom}(p)$ random variable is $\frac{1-p}{p^2}$ . Use that formula to find the variance of a $\operatorname{NegBinom}(n,p)$ random variable.

Friday, September 29

Today we talked about the Poisson distribution. We started with the fact that Virginia gets about 0.3 earthquakes per year (at least earthquakes of magnitude greater than 4). This is a good example of a Poisson process which is any situation where the events you are looking for are rare, but they occur independently of each other and at a predictable rate. We modeled this example with a Binomial distribution where every day there is a very small chance of an earthquake.

How small would p have to be for $\lambda = np$ to be 0.3 for $n = 365$ days?
What if you used $n =(365)(24) = 8760$ hours instead?
Use the Poisson distribution PMF to compute the probability that VA gets an earthquake next year. How does it compare to the two Binomial approximations above?
What is the probability that VA gets an earthquake in the next 4 months?
Write down an infinite series for the expected value of a Pois(λ) random variable.

Theorem. If $X \sim \operatorname{Pois}(\lambda_1)$ and $Y \sim \operatorname{Pois}(\lambda_2)$ are independent, then $X+Y \sim \operatorname{Pois}(\lambda_1 + \lambda_2).$

A store typically gets 10 female customers and 3 male customers per hour. Find the probability distribution for the total number of customers per hour. If the store is open from 10 AM to 6 PM, what is the probability that they get at least 100 customers in one day?
If the store gets 15 customers in one hour, what is the probability that 4 of them are men?

Week 7 Notes

Day	Sections	Topic
Mon, Oct 2	5.1 - 5.3	Continuous random variables
Wed, Oct 4	5.4 - 5.5	Normal and exponential distributions
Fri, Oct 6	5.6	Poisson processess

Monday, October 2

Today we introduced continuous random variables. I gave the following slightly different definition than the one in the book (they are equivalent, however):

Definition. A random variable $X$ is continuous if it has a piecewise continuous probability density function (PDF) $f(x)$ such that $P(a \le X \le b) = \int_a^b f(x)\, dx$ for all $a \le b$ in $[-\infty, \infty]$ . The cumulative distribution function (CDF) for $X$ is $P(X \le k) = \int_{-\infty}^k f(x) \, dx.$ Note that the PDF is always the derivative of the CDF.

We looked at these examples:

The uniform distribution $\operatorname{Unif}(a,b)$ .
The Cauchy distribution with PDF $\displaystyle f(x) = \frac{1}{\pi(x^2 + 1)}$ .
The distribution with PDF $e^{-x}$ on $[0, \infty)$ .

We also defined the expected value of a continuous random variable to be $E(X) = \int_{-\infty}^\infty x f(x) \, dx.$

Find the expected value of the random variable with support $e^{-x}$ on $[0,\infty)$ .
What is $E(U)$ when $U \sim \operatorname{Unif}(a,b)$ ?

Wednesday, October 4

Today we introduced the normal and the exponential distributions. We did the following problems.

Heights of men are approximately $\operatorname{Norm}(70 \text{ inches}, 3 \text{ inches})$ . Heights of women are approximately $\operatorname{Norm}(64.5 \text{ inches}, 2.5 \text{ inches})$ . If you randomly pick an adult, find the conditional probability that they are a woman if they are over 70 inches tall.

Theorem. If $X \sim \operatorname{Norm}(\mu_X, \sigma_X)$ and $Y \sim \operatorname{Norm}(\mu_Y, \sigma_Y)$ are independent random variables, then $X+Y \sim \operatorname{Norm}(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2})$ .

Let $M$ be the height of a random man and $W$ be the height of a random woman. Use the theorem above to find $P(M > W)$ . Hint: First find the probability distribution for $M - W$ .
If $X \sim \operatorname{Exp}(\lambda)$ , find the expected value and variance for $X$ . Explain why the expected value formula makes intuitive sense.

We finished by briefly discussing the fact that exponential random variables are memoryless, that is $P(X > s + t | X > s) = P(X > t).$

Prove that $X \sim \operatorname{Exp}(\lambda)$ is memoryless.

Friday, October 6

Today we started with the Blissville vs. Blotchville example from Section 5.5 in the book.

Then we introduced the Gamma distribution. This is the last of the continuous probability distributions on the Continuous Distributions cheat sheet. We use the parameters $n$ and $\lambda$ , but a lot of software implementations of Gamma distributions use parameters $\alpha$ (which is the same as $n$ , except it can be a decimal) and $\beta$ for the scale which is the reciprocal of the rate $\lambda$ . See for example this Gamma distribution app.

We finished today by talking about inverse CDFs. They convert percentiles into the value (or location) of the random variable at that percentile.

If $X$ is a random variable with CDF $F(x) = 1 - e^{-x^2/2}$ , $x > 0$ , then what is the inverse CDF $F^{-1}(p)$ ?

Week 8 Notes

Day	Sections	Topic
Mon, Oct 9	6.1 - 6.2	Measures of center & moments
Wed, Oct 11	6.3	Sampling moments
Fri, Oct 13	6.7	Probability generating functions

Monday, October 9

Today we talked about the mean and the median of a random variable. We started with the following example:

The time a person has to wait for a bus to arrive in Blotchville is exponentially distributed with mean 10 minutes (so λ=0.1/minute). What is the median wait time? (In the homework, you’ll invert the CDF for the exponential distribution explicitly. But in class we just used the app).
Why does it make sense that the median is less than the average wait time?

Then we introduced moments. We also defined central moments and standardized moments. The 3rd standardized moment is called the skewness and the 4th is called the kurtosis (our book uses a slightly different definition of kurtosis, but I’ll stick with the simpler definition).

We talked about how skewness measures how asymmetric a distribution is, which led to a discussion of symmetric distributions.

Wednesday, October 11

We started by talking about how the first moment (the expected value) corresponds to the center of mass and the second central moment (the variance) corresponds to the moment of inertia (it’s proportional to the kinectic energy of rotation if you spun the PMF/PDF around its center of mass).

After that we talked about sample moments $M_k = \sum_{j = 1}^n X_j^k$ when $X_1, X_2, \ldots, X_n$ is any i.i.d. sample of $n$ random variables.

Definition. A random variable $Y$ is an unbiased estimator for a parameter $\theta$ if $E(Y) = \theta$ .

Show that the sample mean $\bar{X} = \frac{X_1 + X_2 + \ldots + X_n}{n}$ is an unbiased estimator for the population mean $\mu$ .
Show that the $k$ -th sample moment $M_k$ is an unbiased estimator for the $k$ -th moment for any probability distribution.

Then we talked about the problem of how to find a useful unbiased estimator for the variance $\sigma^2$ . The best unbiased estimator ends up being the sample variance $s^2 = \frac{\sum_{j = 1}^n (X_j - \bar{X})^2}{n-1}.$ Why do we divide by $n-1$ ? We started but didn’t finish the explanation in class. First we defined two random vectors which are vectors with random variable entries:

$V = \begin{bmatrix} X_1 - \bar{X} \\ X_2 - \bar{X} \\ \vdots \\ X_n - \bar{X} \end{bmatrix}, ~ W = \begin{bmatrix} X_1 - \mu \\ X_2 - \mu \\ \vdots \\ X_n - \mu \end{bmatrix}$

Recall that the length of a vector $u = \begin{bmatrix} u_1 \\ \vdots \\ u_n \end{bmatrix}$ is $\|u\| = \sqrt{u_1^2 + \ldots + u_n^2}$ .

Show that $\|W\|^2 = n \sigma^2$ where $\sigma^2$ is the variance of each of the RVs $X_j$ .
Show that $V$ is orthogonal to $V-W$ , i.e., show that the dot product $V \cdot (V-W) = 0$ .

We didn’t finish, but we ended with this last question:

Find the variance of $\bar{X}$ .

With all of those pieces, you can calculate the expected value $E(\|V\|^2) = E(\|W\|^2) - E(\|W-V\|^2)$ .

Friday, October 13

Today we started by reviewing moments and the Law of the Unconscious Statistician (LOTUS). I gave out this handout about random variables.

We also finished the calculation from last time to show that $\frac{\sum_{j = 1}^n (X_j - \bar{X})^2}{n-1}$ is an unbiased estimator for the variance.

Then we started a discussion of probability generating functions (PGFs) which are covered in Section 6.7 of the book. We looked at these examples:

The PGF for a six-sided die is: $f(t) = \tfrac{1}{6}t +\tfrac{1}{6} t^2 +\tfrac{1}{6} t^3 +\tfrac{1}{6} t^4 +\tfrac{1}{6} t^5 +\tfrac{1}{6} t^6.$

In general the PGF is a function with a variable $t$ such that coefficients on each term are the probabilities and the powers are the outcomes in a probability model for a discrete random variable.

Theorem. If $X$ and $Y$ are independent discrete RVs with PGFs $f_X(t)$ and $f_Y(t)$ , then $X+Y$ has PGF $f_X(t) \cdot f_Y(t).$

Find the PGF for rolling 6 six-sided dice, and use it to find probability that the total is 20.
Find the PGF for flipping a coin once. What about 10 times?

Week 9 Notes

Day	Sections	Topic
Wed, Oct 18	6.7	Probability generating functions - con’d
Fri, Oct 20	6.4 - 6.5	Moment generating functions

Wednesday, October 18

Today we discussed probability generating functions in more detail. We also talked about using generating functions for counting too. We did this in-class workshop:

Workshop: Generating functions workshop

For people who finished the workshop early, I suggested this extra problem:

How many different subsets of $\{1,\ldots, 10\}$ have elements that sum to 25?

Friday, October 20

Today we introduced moment generating functions (MGFs). We calculated several examples:

Find the MGFs for a Bernouilli(p) random variable and for a six-sided die.
Find the MGF for a Unif(a,b) random variable.
Find the MGF for $X \sim \operatorname{Exp}(1)$ .

Then we talked about these important theorems:

Theorem. (MGFs completely determine the distribution) If two R.V.s have the same MGF on an open interval around 0, then they have the same probability distributions.

Theorem. (MGFs for sums of independent R.V.s) If $X, Y$ are independent R.V.s with MGFs $m_X(t)$ and $m_Y(t)$ , then $X+Y$ has MGF $m_X(t) \cdot m_Y(t)$ .

We didn’t prove these theorems, but we did talk a little about why the second one is true.

If $X, Y \sim \operatorname{Exp}(1)$ are i.i.d., then $X+Y \sim \operatorname{Gamma}(2,1)$ . Find the MGF for $X+Y$ .

Finally we explained why they are called moment generating functions by proving:

Theorem. If $X$ is a R.V. with MGF $m_X(t)$ , then the k-th moment of $X$ is the k-th derivative of $m_X(t)$ at $t=0$ : $E(X^k) = m_X^{(k)}(0).$

We also pointed out that not every R.V. has a moment generating function, because the integral (or sum) that defines the MGF might not converge when $t \ne 0$ .

Week 10 Notes

Day	Sections	Topic
Mon, Oct 23	6.6	Sums of independent r.v.s.
Wed, Oct 25		Review
Fri, Oct 27		no class

Monday, October 23

Today we started with two exercises:

Let $X \sim \operatorname{Pois}(\lambda)$ , so the PMF for $X$ is: $e^{-\lambda} \lambda^k/ k!$ . Find the MGF $M_X(t)$ .
Let $Z \sim \operatorname{Norm}(0,1)$ . Find the MGF for $Z$ . Hint: You’ll have to complete the square: $tx-x^2/2 = -\frac{x^2 - 2tx}{2} = -\frac{(x - t)^2}{2} + \frac{t^2}{2}.$
If $X$ is any random variable with MGF $M_X(t)$ , what is the MGF for $Y = aX + b$ ? Hint: use the definition.
Use your answer to the previous problem to find the MGF for $\mu + \sigma Z$ .

After we did these two exercises, we looked at this Table of Moment Generating Functions. Then we proved these two theorems:

Theorem 1. Let $X \sim \operatorname{Pois}(\lambda)$ and $Y \sim \operatorname{Pois}(\mu)$ be independent. Then $X+Y \sim \operatorname{Pois}(\lambda + \mu)$ .

Theorem 2. Let $X \sim \operatorname{Norm}(\mu_X, \sigma_X)$ and $Y \sim \operatorname{Norm}(\mu_Y, \sigma_Y)$ be independent. Then $X+Y \sim \operatorname{Norm}(\mu_X + \mu_Y, \sqrt{\sigma_X^2 + \sigma_Y^2}).$

Wednesday, October 25

Today we looked at some examples similar to what might be on the midterm on Monday.

Let XX be a RV with PDF f(x)=xe−xdxf(x) = x e^{-x} \, dx with x>0x > 0.
1. Find the CDF for $X$ .
2. Find $P(X > 2)$ .
If X∼Unif(−1,1)X \sim \operatorname{Unif}(-1,1), then XX has MFG mX(t)=et−e−t2tm_X(t) = \dfrac{e^t - e^{-t}}{2t}.
1. Find a Maclaurin series for $m_X(t)$ .
2. Use the Maclaurin series to find the first four moments of $X$ .
Let X∼Bernoulli(p)X \sim \operatorname{Bernoulli}(p).
1. Find the MGF for $X$ .
2. Find a Maclaurin series for the MGF.
3. Use the Macularin series to find $E(X)$ and $E(X^2)$ .
4. Find the variance of $X$ .

Week 11 Notes

Day	Sections	Topic
Mon, Oct 30		Midterm 2
Wed, Nov 1	7.1	Joint distributions
Fri, Nov 3	7.1	Marginal & conditional distributions

Wednesday, November 1

Today we introduced joint distributions. First we defined joint PMFs for discrete random variables:

Definition. A function $f(x,y)$ is a joint probability mass function for two discrete r.v.s $X$ and $Y$ if $P(X=x \text{ and } Y = y) = f(x,y)$ for all pairs $(x,y)$ in the support of $X$ and $Y$ .

We briefly looked at example 7.1.5 from the book before moving on to joint PDFs for continuous random variables:

Definition. A function $f(x,y)$ is a joint probability density function for two continuous r.v.s $X$ and $Y$ if $P(a \le X \le b \text{ and } c \le Y \le d) = \int_c^d \int_a^b f(x,y) \, dx dy.$

We did the example of a uniform distribution on a circle, then we did this workshop:

Workshop: Double Integrals.

If you need to review double integrals, I recommend trying these videos & examples on Khan Academy.

Friday, November 3

Today we talked about marginal and conditional distributions for jointly distributed random variables. If $X, Y$ are jointly distributed, then

The marginal PDF/PMFs $f_X(x)$ and $f_Y(y)$ tell you how $X$ and $Y$ are distributed when you don’t care or know about the other random variable.
$f_X(x) = \sum_y f(x,y) \text{ or } \int_{-\infty}^\infty f(x,y) \, dy.$
The conditional PDF/PMFs $f_{X\,|\,Y}(x\,|\,Y=y)$ and $f_{Y\,|\,X}(y\,|\,X=x)$ tell you the distributions of $X$ & $Y$ are when you know the value of the other.
$f_{X \, | \, Y}(x \, | \, Y = y) = \frac{f(x,y)}{f_Y(y)}.$

The book has a good picture to understand how the shape of the conditional PDF comes from the shape of the joint PDF after you renormalize Figure 7.5 on page 325.

We did these examples.

Let XX and YY be the coordinates of a point chosen randomly and uniformly from inside the unit circle. Find
1. $P(X^2 + Y^2) < \tfrac{1}{9}$
2. The conditional PDF $f_{Y\,|\,X}(y\,|\,X=\tfrac{1}{2})$
3. The marginal PDF $f_X(x)$ .
Roll two 4-sided dice. Let XX be the total and let YY be the maximum of the two dice.
1. Find the marginal PMFs $f_X(x)$ and $f_Y(y)$ .
2. Find the conditional PMF $f_{X\,|\,Y}(x \,|\, Y = 4)$ .

Week 12 Notes

Day	Sections	Topic
Mon, Nov 6	7.2	2D Lotus
Wed, Nov 8	7.3	Covariance & correlation
Fri, Nov 10	7.5	Multivariate normal distribution

Monday, November 6

Today we introduced the 2-dimension version of the Law of the Unconscious Statistician (2D LOTUS). For random variables $X, Y$ with joint PDF or PMF $f(x,y)$ , $E(g(X,Y)) = \int_{-\infty}^\infty \int_{-\infty}^\infty g(x,y) f(x,y) \, dx \, dy \text{ or } \sum_x \sum_y g(x,y) f(x,y).$

Suppose $X, Y \sim \operatorname{Unif}(0,1)$ are i.i.d. RVs. Find $E(|x-y|)$ .
Let $X, Y$ be any independent random variables with joint distribution $f(x,y)$ . Then prove that $E(XY) = E(X) E(Y).$

After these examples, we defined the covariance of two random variables: $\operatorname{Cov}(X,Y) = E((X-\mu_X)(Y-\mu_Y)).$

Show that $\operatorname{Cov}(X,Y) = E(XY) - E(X)E(Y)$ .
Show that $\operatorname{Cov}(X,Y) = 0$ if $X, Y$ are independent.

We also discussed the following additional properties of covariance:

Symmetry $\operatorname{Cov}(X,Y) = \operatorname{Cov}(Y,X)$
Linearity $\operatorname{Cov}(aX+bY,Z) = a \operatorname{Cov}(X,Z) + b \operatorname{Cov}(Y,Z)$
Relation with Variance $\operatorname{Var}(X) = \operatorname{Cov}(X,X)$
Constants $\operatorname{Cov}(X,c) = 0$ .

Finally, we defined the correlation between two random variables: $\rho(X,Y) = \frac{\operatorname{Cov}(X,Y)}{\sigma_X \sigma_Y}.$

Wednesday, November 8

Today we introduced multivariate normal distributions. Suppose that $X_1, \ldots, X_n \overset{i.i.d.}{\sim} \operatorname{Norm}(0,1)$ . We can arrange the values of $X_1, \ldots X_n$ into a vector $X = [X_1, \ldots, X_n]^T$ in $\mathbb{R}^n$ which we call a random vector. Then for any $m$ -by- $n$ matrix $A \in \mathbb{R}^{m \times n}$ and any vector $b \in \mathbb{R}^m$ , the random vector $Y = AX + b$ has a multivariate normal distribution.

On the midterm we had an example where rainfall during the wet season was $W \sim \operatorname(50,12)$ while the rainfall in the dry season is $D \sim \operatorname(10,5)$ . We could express this information using random vectors as: $\begin{bmatrix} W \\ D \end{bmatrix} = \begin{bmatrix} 12 & 0 \\ 0 & 5 \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} + \begin{bmatrix} 50 \\ 10 \end{bmatrix}$ where $X_1, X_2 \overset{i.i.d.}{\sim} \operatorname{Norm}(0,1)$ , that is $X = [X_1, X_2]^T$ is a standard normal random vector. Suppose we wanted to record just the total rainfall $T = W+D$ and the wet season rainfall $W$ . Then we could apply a linear transformation like so: $\begin{bmatrix} T \\ W \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} W \\ D \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \left( \begin{bmatrix} 12 & 0 \\ 0 & 5 \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} + \begin{bmatrix} 50 \\ 10 \end{bmatrix} \right)$ so $\begin{bmatrix} T \\ W \end{bmatrix} = \begin{bmatrix} 12 & 5 \\ 12 & 0 \end{bmatrix} \begin{bmatrix} X_1 \\ X_2 \end{bmatrix} + \begin{bmatrix} 60 \\ 50 \end{bmatrix}.$

For any multivariate normal random vector $Y = AX + b$ where $X$ is a standard normal random vector, the covariance matrix for $Y$ is $\Sigma = AA^T.$ The entries of the covariance matrix are $\Sigma = \begin{bmatrix} \operatorname{Cov}(Y_1,Y_1) & \operatorname{Cov}(Y_1, Y_2) & \ldots & \operatorname{Cov}(Y_1, Y_m) \\ \operatorname{Cov}(Y_2,Y_1) & \ddots & & \vdots \\ \vdots & & \ddots & \vdots \\ \operatorname{Cov}(Y_m,Y_1) & \ldots & \ldots & \operatorname{Cov}(Y_m, Y_m) \end{bmatrix}.$

The heights of fathers and their adult sons have a moderately strong correlation $\rho = 0.5$ . Both father’s and son’s heights (measured in standard deviations from the mean) have a standard normal distribution $\operatorname{Norm}(0,1)$ . But they are not indepedent. Instead, find the covariance matrix for this situation.
You can think of the height of the son $Y$ as a linear combination of his father’s height $X$ and an additional independent random component $Z$ that has a standard normal distribution. Find coefficients for $aX + bZ$ such that the resulting normal distribution has standard deviation $1$ and $\operatorname{Cov}(aX+bZ,X) = 0.5$ .
Compute $AA'$ where $A = \begin{bmatrix} 1 & 0 \\ a & b \end{bmatrix}.$ Do you get the correct covariance matrix for the heights of fathers and sons?

Friday, November 10

Today we started with some examples that applied the following theorem about multivariate normal distributions.

Theorem. If $X = \begin{bmatrix} X_1 \\ \vdots \\ X_n\end{bmatrix}$ is a random vector with a multivariate normal distribution, and if $A \in \mathbb{R}^{m \times n}$ is a matrix, then $Y = AX$ has a multivariate normal distribution. In the special case where $A$ has only one row (i.e., $m=1$ ), $Y$ has a normal distribution.

We used this theorem to help solve the following problems.

Last time we looked at the height of a random father FF and his adult son SS, both of which have a Norm(70,3)\operatorname{Norm}(70,3) distribution, but with correlation ρ=0.5\rho = 0.5. Together the vector [FS]\begin{bmatrix} F \\ S \end{bmatrix} is MVN. Consider how much taller a son might be than the father. What is the distribution of S−FS-F?
1. What is $E(S-F)$ ?
2. What is $\operatorname{Var}(S-F)$ ?
3. Find $P(S-F \ge 1)$ .
Suppose one region gets W∼Norm(50,12)W \sim \operatorname{Norm}(50,12) inches of rain during the wet season and D∼Norm(10,5)D \sim \operatorname{Norm}(10,5) inches of rain in the dry season. Suppose that the total rain W+DW+D has a normal distribution and D,WD, W have a correlation ρ=0.225\rho = 0.225.
1. Find $E(W+D)$ .
2. Find $\operatorname{Var}(W+D)$ .
3. Find $P(W+D \ge 80)$ .

After that, we introduced the log-normal distribution. This is the distribution of $Y = e^X$ if $X \sim \operatorname{Norm}(0,1)$ .

Find $P(Y \le y)$ using $\Phi$ to represent the standard normal CDF (which doesn’t have a nice formula).
Differentiate the CDF for $Y$ to find the PDF for $Y$ .
Find a formula for the moments $E(Y^n)$ . Hint: $E(Y^n) = E(e^{nX})$ which looks a lot like the MGF $m_X(t) = E(e^{tX})$ . So you can use the MGF for a standard normal $X$ to find the moments of $Y$ .

Week 13 Notes

Day	Sections	Topic
Mon, Nov 13	8.1	Change of variables
Wed, Nov 15	8.2	Convolutions
Fri, Nov 17		Review
Mon, Nov 20		Midterm 3

Monday, November 13

Today we did more examples of change of variables for random variables. We focused on one dimensional examples.

Let $U \sim \operatorname{Unif}(0,1)$ . Find PDF for $U^2$ .
Here is how you can generate an exponentially distributed r.v. $X$ . Start by randomly generating $U \sim \operatorname{Unif}(0,1)$ . Then apply the function $f(x) = -\ln x$ to $U$ . Prove that $X = -\ln U$ has the $\operatorname{Exp}(1)$ distribution.
Let $X$ be any r.v. For any monotone (i.e., either always increasing or always decreasing) differentiable function $g$ defined on the support of $X$ , let $Y = g(X)$ . Prove that the PDF of $Y$ is $f_Y(y) = f_X(x) \left| \dfrac{dx}{dy} \right|$ .
Find the PDF for $X^3$ where $X \sim \operatorname{Norm}(0,1)$ .

We also defined the $\chi^2$ -distribution, which is the distribution of a sum of i.i.d. random variables $X_1^2 + \ldots + X_n^2$ where each $X_i \sim \operatorname{Norm}(0,1)$ . When $n = 1$ , we were able to find the PDF for the $\chi^2(1)$ distribution even though the function $g(x) = x^2$ is not monotone increasing on the whole real line.

Wednesday, November 15

Today we defined the convolution of two functions $(f_X \ast f_Y)(t) = \int_{-\infty}^\infty f_Y(t-x) f_X(x) \, dx = \int_{-\infty}^\infty f_X(t-y) f_Y(y) \, dy.$ If $X, Y$ are independent continuous random variables with PDFs $f_X$ and $f_Y$ , then $X+Y$ has PDF $f_X \ast f_Y$ . You can also define the discrete convolution, which gives the PMF for a sum of two independent discrete random variables. $(f_X \ast f_Y)(k) = \sum_{x} f_Y(k-x) f_X(x) = \sum_y f_X(k-y) f_Y(y)$

Find the PMF for the sum of two 6-sided dice.
If $X, Y \overset{i.i.d.}{\sim} \operatorname{Unif}(0,1)$ , find the PDF for $X+Y$ . (https://youtu.be/Blg5RIjGwBE)

To help with the notation in this last problem, we introduced indicator functions $\mathbf{1}_A$ . (See https://en.wikipedia.org/wiki/Indicator_function)

Find the PDF for the sum of $X, Y \overset{i.i.d.}{\sim} \operatorname{Exp}(\lambda)$ . (https://youtu.be/Glff9dvPVEg)
Last time, we found the distribution for $X^2$ if $X \sim \operatorname{Norm}(0,1)$ . The PDF for $X^2$ is $f_{X^2}(x) = \frac{1}{\sqrt{2 \pi x}} e^{-x/2}.$ This is the $\chi^2(1)$ PDF. Now suppose that we have $X, Y \overset{i.i.d.}{\sim} \operatorname{Norm}(0,1)$ random variables. Set up, but don’t evaluate a convolution integral for the PDF of $X^2 + Y^2$ .

Friday, November 17

Today we did a review of material that will be on midterm 3. This includes:

Joint distributions (including marginal & conditional distributions, and 2D-LOTUS).
Covariance
Multivariate normal distributions (mostly the observation that linear transformations of MVNs are still normal).
Transformations
Convolution

We did these examples in class:

Suppose $X, Y$ are discrete r.v.s. each taking values in $\{0,1\}$ with joint PMF $f(x,y)$ given by $f(0,0) = \tfrac{1}{2}, ~f(0,1) = \tfrac{1}{3}, ~f(1,0) = \tfrac{1}{6}, ~f(1,1) = 0.$
1. Find the PMFs for $X$ , $Y$ , and $X+Y$
2. Find the conditional PMF for $X$ if $Y=0$ .
3. Find $E(XY)$
4. Find $\operatorname{Cov}(X,Y)$
Suppose that $(X,Y)$ are continuous r.v.s. that are jointly uniformly distributed in the unit disk $D = \{(x,y) \in \mathbb{R}^2 \, : \, x^2+y^2 \le 1\}.$
1. What is the joint PDF for $X$ and $Y$ ?
2. Write down, but don’t evaluate, an integral to find the expected distance from $(X,Y)$ to the point $(0,1)$ at the top of the unit disk.
Suppose $X,Y$ are both $\operatorname{Norm}(0,1)$ r.v.s., and the correlation between $X$ and $Y$ is $\rho = 0.5$ . Find $P(X+2Y \ge 3)$ .
Let $X \sim \operatorname{Unif}(0,2)$ and $Y \sim \operatorname{Unif}(0,1)$ be independent r.v.s. Find the PDF for $X+Y$ .

Week 14 Notes

Day	Sections	Topic
Mon, Nov 27	10.1 - 10.2	Inequalities & Law of large numbers
Wed, Nov 29	10.3	Central limit theorem
Fri, Dec 1	10.3	Applications of LLN & CLT
Mon, Dec 4		Review & recap

Monday, November 27

Today we introduced two important inequalities: the Markov Inequality and Chebyshev’s Inequality.

Theorem (Markov’s Inequality). For any r.v. $X$ and constant $a > 0$ , $P(|X| \ge a) \le \frac{E(|X|)}{a}.$

We gave a visual proof by looking at the graph of the PDF for $X$ (assuming $X$ is continuous, but the proof is essentially the same for discrete r.v.s.) and comparing the integrals to find $P(|X| \ge a)$ with the one to find $E(\tfrac{1}{a} |X|)$ . Which integral is bigger?

An corollary of Markov’s inequality is this more useful inequality.

Theorem (Chebyshev’s Inequality). Let $X$ be any r.v. with mean $\mu$ and variance $\sigma^2$ . Let $a > 0$ . Then $P(|X - \mu | \ge a) \le \frac{\sigma^2}{a^2}.$

Prove Chebyshev’s inequality by applying Markov’s inequality to the random variable $(X-\mu)^2$ .
Here on Earth, heights of adults are roughly normally distributed. But if you go to another planet, they might have a totally different probability distribution. Explain how we can be certain that less than 25% of Martians have a height that is 2 standard deviations above average.
If $X_1, \ldots, X_N$ are i.i.d. r.v.s., with mean $\mu$ and standard deviation $\sigma$ , then what is the mean and standard deviation of $\bar{X} = \frac{X_1 + \ldots + X_n}{n}?$
What does Chebyshev’s inequality say about the probability $P(|\bar{X} - \mu| \ge a)$ ? What happens as $n$ gets bigger?

Wednesday, November 29

Today we talked about the central limit theorem.

Central Limit Theorem. Let $X_1, X_2, \ldots, X_n$ be i.i.d. r.v.s. with mean $\mu$ and variance $\sigma^2$ . Then the PMF or PDF for the random variable $Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}$ converges to the PDF for a $\operatorname{Norm}(0,1)$ random variable as $n \rightarrow \infty$ .

Central Limit Theorem for Dice

We proved this theorem in class under the extra assumption that the MGF for $X_i$ exists. We also assumed for simplicity that $\mu = 0$ and $\sigma=1$ . Neither assumption is necessary (and it is easy to get rid of the assumption that $\mu = 0$ and $\sigma=1$ ).

Let $M(t)$ denote the MGF for each of the $X_i$ . How do we know that they all have the same MGF?
Show that the MGF for $Z = \bar{X}/\sqrt{n}$ is $M(t/\sqrt{n})^n$ .
What are $M(0)$ , $M'(0)$ , and $M''(0)$ ?
Find $\lim_{n \rightarrow \infty} M(t/\sqrt{n})^n$ .

We finished with this exercise:

If you roll 10 six-sided dice, estimate the probability that the total is at least 50.

The key to the last problem is to use these normal approximation facts:

Corollary (Normal Approximations). If $X_1, \ldots, X_n$ are i.i.d. RVs with mean $\mu$ and variance $\sigma^2$ and $n$ is large, then

The total $X_1 + X_2 + \ldots + X_n$ is approximately $\operatorname{Norm}(n \mu, \sqrt{n} \sigma)$ .
The sample mean $\bar{X}$ is approximately $\operatorname{Norm}(\mu, \tfrac{\sigma}{\sqrt{n}})$ .

Friday, December 1

Today we talked about applications of the Central Limit Theorem and the Law of Large Numbers. We started with this corollary of the Central Limit Theorem that we didn’t write down explicitly in class last time:

Corollary (Normal Approximations). If $X_1, \ldots, X_n$ are i.i.d. RVs with mean $\mu$ and variance $\sigma^2$ and $n$ is large, then

The total $X_1 + X_2 + \ldots + X_n$ is approximately $\operatorname{Norm}(n \mu, \sqrt{n} \sigma)$ .
The sample mean $\bar{X}$ is approximately $\operatorname{Norm}(\mu, \tfrac{\sigma}{\sqrt{n}})$ .

Adults in the USA have a mean weight of 170 lbs. with a standard deviation of 40 lbs. If a random sample of 100 adult passengers boards an airplane, what is the probability that their total weight exceeds 18,000 lbs?

We also used the corollary to derive the formula for a 95% confidence interval:

95% Confidence Intervals In a large sample, $\bar{X}$ is within 2 standard deviations of $\mu$ , 95% of the time. So if you know (or have a good estimate) for $\sigma$ , then you can use $\bar{X}$ to estimate $\mu$ :
$\bar{x} \pm 2 \frac{\sigma}{\sqrt{n}}.$

We finished by introducing Monte Carlo Integration.

Find $\int_0^1 \sqrt{1-x^2} \, dx$ using Monte Carlo integration. Write a computer program to randomly generate points uniformly in the square $[0,1] \times [0,1]$ , then record 1 if the point is under the curve, or 0 if it is not.
When you randomly generate points in a rectangle an calculate the proportion that hit the region you want, what is the approximate probability distribution for the proportion that hit? What is its mean and standard deviation?

Monday, December 4

Today we did a review of some material that might be on the final. We did the following problems in class:

Suppose 10 men and 10 women get in a line in a random order. Find the probability that the 10 men are in front of the 10 women in the line.
Let $X$ be a random variable that is partially determined by flipping a coin. If the coin is heads, then $X \sim \operatorname{Exp}(1)$ and if the coin is tails, then $X \sim \operatorname{Unif}(0,1)$ . Find $P( \text{head} \,|\, X \ge 0.9)$ .
Let $X \sim \operatorname{Norm}(0,1)$ and $Y \sim \operatorname{Exp}(1)$ be independent random variables. Set up, but do not evaluate an integral that represents $P(Y \ge X)$ .

We also reviewed two problems from midterm 3. The problem about finding the standard deviation of a difference of two correlated normal random variables and the problem of finding the expected value of a function for a pair of random variables that are jointly uniformly on a triangle.