GSS Sibling Data

How many siblings do people in the USA have? The data set below is from the 2016 General Social Survey (GSS). The GSS is a well respected survey, so we will treat the results as a SRS from the population of the United States. The data below contains information from the 2016 General Social Survey about the number of siblings respondents in the sample have. It also contains the gender, race, religion, and self-reported happiness level of the respondents. Note that respondents were instructed to include stepbrothers, stepsisters, and other children adopted by their parents as siblings.

gss = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/SiblingData.csv")
head(gss)
##   siblings    sex    happiness
## 1        2   Male Pretty happy
## 2        3   Male Pretty happy
## 3        3   Male   Very happy
## 4        3 Female Pretty happy
## 5        2 Female   Very happy
## 6        2 Female   Very happy
dim(gss)
## [1] 2867    3
gss = na.omit(gss)
dim(gss)
## [1] 2855    3
summary(gss$siblings)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    3.00    3.71    5.00   43.00
barplot(table(gss$siblings), xlab='# of siblings', ylab = 'frequency')

Below is a bootstrap distribution for the mean # of siblings.

boot.dist = c()
for (i in 1:5000) {
  boot.sample = sample(gss$siblings,replace = T)
  boot.stat = mean(boot.sample)
  boot.dist = c(boot.stat,boot.dist)
}
hist(boot.dist,col='gray')

qqnorm(boot.dist)
qqline(boot.dist)

Questions

  1. What additional information would I need to make a bootstrap confidence interval for the population mean?

  2. Below are the means of 200 bootstrap samples. Based on this information, make a 95% confidence interval for the mean using the quantile method.

##   [1] 3.546060 3.592995 3.597898 3.600000 3.602102 3.602452 3.606305
##   [8] 3.608757 3.615412 3.621366 3.621716 3.623117 3.624518 3.625569
##  [15] 3.627671 3.628021 3.628021 3.629772 3.632574 3.632925 3.636778
##  [22] 3.641331 3.645534 3.646235 3.647285 3.651138 3.652539 3.653590
##  [29] 3.653940 3.654291 3.656042 3.657093 3.659545 3.660245 3.661296
##  [36] 3.663398 3.663748 3.664448 3.664448 3.664799 3.667951 3.667951
##  [43] 3.668651 3.670753 3.671454 3.672154 3.673555 3.673555 3.674606
##  [50] 3.674956 3.674956 3.676357 3.678459 3.678459 3.678809 3.679510
##  [57] 3.679510 3.679860 3.680560 3.680911 3.680911 3.681611 3.683363
##  [64] 3.683713 3.686165 3.687566 3.687566 3.687916 3.689667 3.690018
##  [71] 3.690368 3.690718 3.691068 3.692469 3.693870 3.695622 3.696673
##  [78] 3.697723 3.698424 3.698774 3.700525 3.702977 3.704028 3.704378
##  [85] 3.704378 3.704378 3.705079 3.706130 3.706480 3.706830 3.707180
##  [92] 3.707531 3.709282 3.710683 3.710683 3.711384 3.714536 3.714886
##  [99] 3.716988 3.717688 3.717688 3.718389 3.718389 3.718389 3.718389
## [106] 3.718389 3.719440 3.719790 3.719790 3.720490 3.720490 3.721541
## [113] 3.722242 3.722592 3.723292 3.723643 3.723643 3.723993 3.727496
## [120] 3.733100 3.733450 3.733800 3.735902 3.736252 3.736602 3.736953
## [127] 3.737303 3.739054 3.739755 3.740455 3.740455 3.741506 3.741856
## [134] 3.741856 3.742207 3.743958 3.744308 3.746410 3.746410 3.747811
## [141] 3.747811 3.751664 3.752014 3.752014 3.752364 3.753065 3.753415
## [148] 3.755867 3.757618 3.758669 3.759370 3.761121 3.761471 3.761821
## [155] 3.764273 3.764974 3.765674 3.767776 3.771629 3.772329 3.772680
## [162] 3.772680 3.773030 3.773380 3.776182 3.776883 3.777233 3.777933
## [169] 3.779335 3.780035 3.780385 3.783888 3.784939 3.785289 3.789842
## [176] 3.789842 3.789842 3.790543 3.794746 3.795797 3.796147 3.803152
## [183] 3.805254 3.806655 3.810158 3.810508 3.812960 3.814361 3.818214
## [190] 3.821366 3.823818 3.825569 3.826970 3.830473 3.843433 3.845884
## [197] 3.852189 3.852539 3.865849 3.873205
  1. Why would it be silly to make a bootstrap confidence interval for the median number of siblings in the USA? (Hint: what are the possible values of the median # of siblings?)