Do Women Talk More Than Men?

Researchers equiped male and female college students with a small device that secretly recorded sounds for a random 30 seconds during each 12.5 minute period over 2 days. By counting the words each subject spoke while being recorded, the researchers were able to estimate how many words each subject spoke per day. The results are contain in the following file.

talking = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring18/math222/examples/talking.csv")
men = subset(talking,Sex=="M")$Words
women = subset(talking,Sex=="F")$Words
summary(talking)

##  Sex        Words      
##  F:27   Min.   : 1537  
##  M:20   1st Qu.: 8892  
##         Median :12584  
##         Mean   :14658  
##         3rd Qu.:19186  
##         Max.   :39681

sd(men)

## [1] 8342.472

summary(men)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3998    7868   10770   12870   13870   37790

sd(women)

## [1] 8421.497

summary(women)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1537    9915   13620   15980   21420   39680

In this sample there were 20 men and 27 women. Here is a side-by-side boxplot showing the differences between the two groups.

boxplot(men,women,names = c("Men","Women"),horizontal = T,col='gray',xlab="# Words per Day")

Significance Test Options

One natural question to ask is: does this data give statistically significant evidence that there is a difference between the number of words spoken by men versus women? For each of the following statistical tests, describe whether it is a valid option for answering this question and comment on any advantages or disadvantages it has compared with the other options.

2-sample t-test
Log-transform the data, then apply a 2-sample t-test
Permutation test
Bootstrap the difference in means

Confidence Interval Options

A second natural question is: how big is the difference between the two genders? Once again, comment on these options for answering this question.

2-sample t-distribution confidence interval
Log-transform the data, then make a 2-sample t-distribution confidence interval
Bootstrap the difference in means