Chapter 1 Problems

1.23 Medical students. Students who have finished medical school are assigned to residencies in hospitals to receive further training in a medical specialty. Here is part of a hypothetical data base of students seeking residency positions. USMLE is the student's score on Step 1 of the national medical licensing examination.

Name Medical school Sex Age USMLE Specialty sought
Abrams, Laurie Florida F 28 238 Familty medicine
Brown, Gordon Meharry M 25 205 Radiology
Cabrera, Maria Tufts F 26 191 Pediatrics
Ismael, Miranda Indiana F 32 245 Internal medicine
  1. What individuals does this data set describe?
  2. In addition to the student's name, how many variables does the data set contain? Which of these variables are categorical and which are quantitative?

1.26 Facebook, Twitter, and LinkedIn users. After years of explosive growth in number of users of social networking sites in all age ranges and demographics, it is hard to argue that social media haven't changed forever how we interact and connect online. Although Facebook is still the dominant players in social networking, both Twitter and LinkedIn have continued to increase their usage. Here is the age distribution of the users for the three sites in 2013: SOCIALNT

Age Facebook Users Twitter Users LinkedIn Users
13 to 17 years 10% 10% 4%
18 to 24 years 14% 18% 10%
25 to 34 years 19% 22% 20%
35 to 44 years 17% 17% 18%
45 to 54 years 17% 15% 20%
55 to 64 years 13% 11% 17%
Over 65 years 10% 7% 11%
  1. Draw a bar graph for the age distribution of Facebook visitors. The leftmost bar should correspond to "13 to 17", the next bar to "18 to 24", and so on. Do the same for Twitter and LinkedIn, using the same scale for the percent axis.
  2. Describe the most important difference in the age distribution of the audience for these three social networking sites. How does this difference show up in the bar graphs? Do you think it was important to order the bars by age to make the comparison easier?
  3. Explain why it is appropriate to use a pie chart to display either of these distributions. Draw a pie chart for each distribution. Do you think it is easier to compare the three distributions with bar graphs or pie charts? Explain your reasoning.

1.35 Where are the nurses? The following spreadsheet gives the number of active nurses per 100,000 people in each state. NURSES

  1. Why is the number of nurses per 100,000 people a better measure of the availability of nurses than a simple count of the number of nurses in a state?
  2. Make a histogram that displays the distribution of nurses per 100,000 people. Write a brief description of the distribution. Are there any outliers? If so, can you explain them?

Chapter 2 Problems

2.25 Incomes of college grads. According to the Census Bureau's Current Population Survey, the mean and median 2012 income of people at least 25 years old who had a bachelor's degree but no higher degree were $50,281 and $62,597. Which of these numbers is the mean and which is the median? Explain your reasoning.

2.32 Maternal age at childbirth. How old are women when they have their first child? Here is the distribution of the age of the mother for all firstborn children in the United States in 2012:

Age Count         Age Count
10 to 14 years 3,578 30 to 34 years 299,857
15 to 19 years 251,022 35 to 39 years 106,892
20 to 24 years 461,553 40 to 44 years 24,251
25 to 29 years 421,704 45 to 49 years 1,952

The number of firstborn children to mothers under 10 or over 50 years of age represent a negligible percentage of all first births, and are not included in the table.

  1. For comparison with other years and with other countries, we prefer a histogram of the percents in each age class rather than the counts. Explain why.
  2. How many babies are there?
  3. Make a histogram of the distribution, using percents on the vertical scale. Using this histogram, describe the distribution of the age at which women have their first child.
  4. What are the locations of the median and quartiles in the ordered list of all maternal ages? In which age classes do the median and quartiles fall?

2.50 Graduation rates. In Exercise 1.10 (page 32) you were asked to use a stemplot to display the distribution of the percents of on-time high school graduates in the states. Stemplots help you find the five-number summary because they arrange the observations in increasing order. GRADRATE

    5 | 9
    6 | 23
    6 | 788
    7 | 112444
    7 | 56666777888
    8 | 0001122333333344
    8 | 666666778
  1. Give the five-number summary of this distribution.
  2. Use the five-number summary to draw a boxplot of the data of all the data. What is the shape of the distribution?
  3. Which observations does the \(1.5 \times IQR\) rule flag as suspect outliers? Is there a simple explanation for the outlier(s)?