Tranforming Quanditative Variables

One measure of water quality is turbidity, the degree of opaqueness produced in water by suspended particulate matter. Turbidity can be measured by see ing how light is scattered and/or absorbed by organic and inorganic material. Larger nephelometric turbidity units (NTU) indicate increased turbidityand decreased light penetration. If there is too much turbidity, then not enough light may be penetrating the water, affecting photosynthesis to the surface and leading to less dissolved oxygen. Riggs (2002) provides 244 turbidity monthly readings that were recorded between 1980-2000 from a reach of the Mermentau River in Southwest Louisiana. The unit of analysis was the monthly mean turbidity (NTU) computed from each month’s systematic sample of 21 turbidity measurements. The investigators wanted to determine whether the mean turbidity was greater than the local criterion value of 150 NTU.

turb=read.csv("http://www.rossmanchance.com/iscam2/data/turbidity.txt")
dim(turb)
## [1] 244   1
head(turb)
##   turbidity
## 1         8
## 2        12
## 3        14
## 4        14
## 5        14
## 6        15

The data includes the monthly rainfall totals in Farmville from 1931 to 2012. Here is what the data for February looks like:

hist(turb$turbidity,col='gray',main='Turbidity',xlab='NTU')

qqnorm(turb$turbidity)
qqline(turb$turbidity)

Even though the sample size is really large, the skew in this data is so strong that we should hesitate before using the standard t-distribution methods. We could try bootstrapping, to try to see how badly this skew might be affecting the sampling distribution.

boot.dist = c()
for (i in 1:5000) {
  boot.samp = sample(turb$turbidity,244,replace=T)
  boot.stat = mean(boot.samp)
  boot.dist = c(boot.dist,boot.stat)
}
hist(boot.dist,col='gray',main='Bootstrap Distribution for Mean',xlab='NTU')

qqnorm(boot.dist)
qqline(boot.dist)

Looking at the bootstrap distribution, you can see from the qq-plot that it is still skewed right, which means that the t-distribution model (both the standard and the bootstrap version) might not be valid.

Transforming Data

With right skewed data, it is common to transform the numbers by applying a log-function to the numbers, before carrying out a statistical analysis.

logData = log(turb$turbidity)
qqnorm(logData)
qqline(logData)