The following problems are similar to what might be on midterm 1. If you need to calculate a percentile or a location for a normal or other probability distribution, you may write your answer using the R commands pnorm(), qnorm(), pbimom(), qbinom(), etc.


  1. Relaxing after work. The 2010 General Social Survey asked the question, “After an average work day, about how many hours do you have to relax or pursue activities that you enjoy?” to a random sample of 1,155 Americans. The average relaxing time was found to be 1.65 hours. Determine which of the following is an individual, a variable, a sample statistic, or a population parameter.
  1. An American in the sample.
  2. Number of hours spent relaxing after an average work day.
  3. 1.65.
  4. Average number of hours all Americans spend relaxing after an average work day

  1. Vitamin supplements. In order to assess the effectiveness of taking large doses of vitamin C in reducing the duration of the common cold, researchers recruited 400 healthy volunteers from staff and students at a university. A quarter of the patients were assigned a placebo, and the rest were evenly divided between 1g Vitamin C, 3g Vitamin C, or 3g Vitamin C plus additives to be taken at onset of a cold for the following two days. All tablets had identical appearance and packaging. The nurses who handed the prescribed pills to the patients knew which patient received which treatment, but the researchers assessing the patients when they were sick did not. No significant differences were observed in any measure of cold duration or severity between the four medication groups, and the placebo group had the shortest duration of symptoms.
  1. Was this an experiment or an observational study? Why?
  2. What are the explanatory and response variables in this study?
  3. Were the patients blinded to their treatment?
  4. Was this study double-blind?
  5. Participants are ultimately able to choose whether or not to use the pills prescribed to them. We might expect that not all of them will adhere and take their pills. Does this introduce a confounding variable to the study? Explain your reasoning.

  1. Stats scores. Below are the final exam scores of twenty introductory statistics students.

    57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94

  1. Create a box plot of the distribution of these scores.
  2. Draw a histogram for the scores.
  3. Describe the distribution of the data.

  1. Distributions. For each of the following, state whether you expect the distribution to be symmetric, right skewed, or left skewed. Also specify whether the mean or median would best represent a typical observation in the data, and whether the variability of observations would be best represented using the standard deviation or IQR. Explain your reasoning.
  1. Number of pets per household.
  2. Distance to work, i.e. number of miles between work and home.
  3. Heights of adult males.

  1. Facebook friends. Facebook data indicate that 50% of Facebook users have 100 or more friends, and that the average friend count of users is 190. What do these findings suggest about the shape of the distribution of number of friends of Facebook users?

  1. Baggage fees. An airline charges the following baggage fees: $25 for the first bag and $35 for the second. Suppose 54% of passengers have no checked luggage, 34% have one piece of checked luggage and 12% have two pieces. We suppose a negligible portion of people check more than two bags.
  1. Build a probability model, compute the average revenue per passenger, and compute the corresponding standard deviation.
  2. About how much revenue should the airline expect for a flight of 120 passengers? With what standard deviation? Note any assumptions you make and if you think they are justified.

  1. ACT scores. The scores of high school seniors on the ACT college entrance examination in 2003 had mean \(\mu = 20.8\) and standard deviation \(\sigma = 4.8\). The distribution of scores is only roughly Normal.
  1. If we take a simple random sample of 25 students who took the test, what are the theoretical mean, variance, and standard deviation of the sample mean score \(\bar{x}\) of these 25 students?
  2. Estimate the probability \(P(\bar{x} > 23)\).
  3. Even though the scores for individuals students are only approximately normally distributed, it is pretty safe to use a normal distribution to answer the question above. Why?

  1. Model assumptions. Probability models are based on assumptions. Sometimes, we use models even when we know that the assumptions are not all 100% true. Here are two situations where it is not a good idea to use a given probability model.
  1. You are interested in attitudes toward drinking among college students. You choose 30 students at random to interview. One question is “Have you had five or more drinks at one time during the last week?” Suppose that in fact 30% of the 75 members would say “Yes.” Explain why you cannot safely use the B(30,0.3) distribution for the count X in your sample who say “Yes.”
  2. The National AIDS Behavioral Surveys found that 0.2% (that’s 0.002 as a decimal fraction) of adult heterosexuals had both received a blood transfusion and had a sexual partner from a group at high risk of AIDS. Suppose that this national proportion holds for your region. Explain why you cannot safely use a Normal approximation to model the distribution of the number of people in this group when you interview an random sample of 1000 adults.

  1. School absences. Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% miss 2 days, and 28% miss 3 or more days due to sickness.
  1. What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?
  2. What is the probability that a student chosen at random misses no more than one day?
  3. What is the probability that a student chosen at random misses at least one day?
  4. If a parent has two kids at a DeKalb County elementary school, what is the probability that either kid will miss any school? Note any assumption you must make to answer this question.
  5. If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at least one day? Note any assumption you make.
  6. If you made an assumption in part (d) or (e), do you think it was reasonable? If you didn’t make any assumptions, double check your earlier answers.

  1. Man or woman? Men in the United States have an average height of 69.9 inches with a standard deviation 3.1 inches. Women in the USA have an average height of 64.3 inches with a standard deviation of 2.7 inches. Assume that both groups have a normal distribution.
  1. What is the probability that a man is over 6 feet tall? What is the probability that a woman is over 6 feet tall?
  2. Given that a randomly selected adult is over 6 feet tall, what is the probability they are a woman? Hint: make a tree diagram.