The data file linked below has data on a large sample of body temperature recordings from healthy adults.

myData = read.csv("/home/brian/Dropbox/HSC/Spring17/Math222/Data/BodyTempPop.txt")
summary(myData$bodytemp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    95.8    98.1    98.6    98.6    99.1   101.6
dim(myData)
## [1] 10000     1
hist(myData$bodytemp,col='gray')

qqnorm(myData$bodytemp); qqline(myData$bodytemp)

As you can see, this data definitely has a normal distribution.

Sampling Distribution

If we take a sample of 13 people from this data, then the probability distribution of \(\bar{x}\) will also be normal. We want to test the hypotheses:

Unfortunately, we can’t use the formula \(z = \dfrac{\bar{x}-\mu_0}{{\sigma/\sqrt{N}}}\) because we don’t know \(\sigma\). If we try to use \(s\) instead of \(\sigma\), things don’t work out so well when the sample size is small.

For example, with \(N=13\), here is a simulation of what might happen.

sampleResults = c()
for (i in 1:10000) {
  newSample = sample(myData$bodytemp,13)
  xbar = mean(newSample)
  s = sd(newSample)
  tval = (xbar-98.6)/(s/sqrt(13))
  sampleResults = c(sampleResults,tval)
}
hist(sampleResults,col='gray')

qqnorm(sampleResults)
qqline(sampleResults)

As you can see, t-values constructed using the formula \(t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{N}}\) do not have a normal distribution. The actual values on both tails are more extreme then the normal model predicts they should be. We need a different probability distribution: the t-distribution.