The data file linked below has data on a large sample of body temperature recordings from healthy adults.
myData = read.csv("/home/brian/Dropbox/HSC/Spring17/Math222/Data/BodyTempPop.txt")
summary(myData$bodytemp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 95.8 98.1 98.6 98.6 99.1 101.6
dim(myData)
## [1] 10000 1
hist(myData$bodytemp,col='gray')
qqnorm(myData$bodytemp); qqline(myData$bodytemp)
As you can see, this data definitely has a normal distribution.
If we take a sample of 13 people from this data, then the probability distribution of \(\bar{x}\) will also be normal. We want to test the hypotheses:
Unfortunately, we can’t use the formula \(z = \dfrac{\bar{x}-\mu_0}{{\sigma/\sqrt{N}}}\) because we don’t know \(\sigma\). If we try to use \(s\) instead of \(\sigma\), things don’t work out so well when the sample size is small.
For example, with \(N=13\), here is a simulation of what might happen.
sampleResults = c()
for (i in 1:10000) {
newSample = sample(myData$bodytemp,13)
xbar = mean(newSample)
s = sd(newSample)
tval = (xbar-98.6)/(s/sqrt(13))
sampleResults = c(sampleResults,tval)
}
hist(sampleResults,col='gray')
qqnorm(sampleResults)
qqline(sampleResults)
As you can see, t-values constructed using the formula \(t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{N}}\) do not have a normal distribution. The actual values on both tails are more extreme then the normal model predicts they should be. We need a different probability distribution: the t-distribution.