Choosing the Right Transformation

Two students took a random sample of 30 textbooks for sale in a campus bookstore in 2006.

books=read.csv("TextPrices.csv")
hist(books$Price,col='gray',main='Textbook Prices',xlab='Price (dollars)')

The prices are clearly skewed to the right.

Which Transformation is Better?

Below we compare two common transformations for right skewed data: logarithms and square-roots. As you can see, neither transformation makes the data very normal, but the square-root transform is more symmetric so it would be a better choice.

hist(log(books$Price),col='gray',main='Log-transformed data',xlab='log(dollars)')

qqnorm(log(books$Price))
qqline(log(books$Price))

summary(log(books$Price))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.447   2.865   4.007   3.696   4.562   5.134
hist(sqrt(books$Price),col='gray',main='Square-root transformed data',xlab='sqrt(dollars)')

qqnorm(sqrt(books$Price))
qqline(sqrt(books$Price))

summary(sqrt(books$Price))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.062   4.192   7.421   7.301   9.785  13.029