Transforming Quantitative Variables

One measure of water quality is turbidity, the degree of opaqueness produced in water by suspended particulate matter. Turbidity can be measured by seeing how light is scattered and/or absorbed by organic and inorganic material. Larger nephelometric turbidity units (NTU) indicate increased turbidity and decreased light penetration. If there is too much turbidity, then not enough light may be penetrating the water, affecting photosynthesis to the surface and leading to less dissolved oxygen. Riggs (2002) provides 244 turbidity monthly readings that were recorded between 1980-2000 from a reach of the Mermentau River in Southwest Louisiana. The unit of analysis was the monthly mean turbidity (NTU) computed from each month’s systematic sample of 21 turbidity measurements. The investigators wanted to determine whether the mean turbidity was greater than the local criterion value of 150 NTU.

turb=read.csv("http://www.rossmanchance.com/iscam2/data/turbidity.txt")
dim(turb)
## [1] 244   1
head(turb)
##   turbidity
## 1         8
## 2        12
## 3        14
## 4        14
## 5        14
## 6        15
hist(turb$turbidity,col='gray',main='Turbidity',xlab='NTU')

qqnorm(turb$turbidity)
qqline(turb$turbidity)

Even though the sample size is really large, the skew in this data is so strong that we should hesitate before using the standard t-distribution methods.

Transforming Data

With right skewed data, it is common to transform the numbers by applying a log-function to the numbers, before carrying out a statistical analysis.

logData = log(turb$turbidity)
hist(logData,col='gray',main="Log-Transformed Data",xlab='log(NTU)')

qqnorm(logData)
qqline(logData)