The t-Distribution

The data file linked below has data on a large sample of body temperature recordings from healthy adults.

Sampling Distribution

If we take a sample of 13 people from this data, then the probability distribution of \(\bar{x}\) will also be normal. We want to test the hypotheses:

\(H_0: \mu = 98.6^\circ\),
\(H_A: \mu \ne 98.6^\circ\).

Unfortunately, we can’t use the formula \(z = \dfrac{\bar{x}-\mu_0}{{\sigma/\sqrt{N}}}\) because we don’t know \(\sigma\). If we try to use \(s\) instead of \(\sigma\), things don’t work out so well when the sample size is small.

For example, with \(N=13\), here is a simulation of what might happen.

sampleResults = c()
for (i in 1:10000) {
  newSample = sample(myData$bodytemp,13)
  xbar = mean(newSample)
  s = sd(newSample)
  tval = (xbar-98.6)/(s/sqrt(13))
  sampleResults = c(sampleResults,tval)
}
hist(sampleResults,col='gray')

qqnorm(sampleResults)
qqline(sampleResults)

As you can see, t-values constructed using the formula \(t = \dfrac{\bar{x}-\mu_0}{s/\sqrt{N}}\) do not have a normal distribution. The actual values on both tails are more extreme then the normal model predicts they should be. We need a different probability distribution: the t-distribution.