Cloud Seeding: Log-Transformed Data

Cloud seeding is the processes of spraying clouds with a chemical solution to trigger the formation of raindrops. The following data is based on an experiment in Florida from the 1970s. On 52 separate days, target clouds were identified. On half of the days (randomly selected), a plane flew through the clouds spraying silver iodide solution. Radar was then used to measure the volume of rainfall produced (in acre-feet).

cloud = read.csv("http://people.hsc.edu/faculty-staff/blins/classes/spring17/math222/data/CloudSeeding.csv")
tre = subset(cloud,treatment=='seeded')
con = subset(cloud,treatment=='unseeded')
summary(con$rainfall)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   24.82   44.20  164.60  159.20 1203.00

summary(tre$rainfall)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.10   98.13  221.60  442.00  406.00 2746.00

From the summaries, you can see that the differences in mean rainfall are impressive. The mean increased from 164.6 in the control clouds to 446.0 in the treatment clouds. Unfortunately, this data is extremely right skewed with several large outliers in each group.

par(mfrow=c(2,2))
hist(con$rainfall,col='gray',main='Control Group Rainfall',xlab='acre-ft')
hist(tre$rainfall,col='gray',main='Treatment Group Rainfall',xlab='acre-ft')
qqnorm(con$rainfall); qqline(con$rainfall)
qqnorm(tre$rainfall); qqline(tre$rainfall)

Because of the large right skewed in both groups, we might consider transforming the data with a log-transform. Here is what the log-transformed data looks like.

par(mfrow=c(2,2))
hist(log(con$rainfall),col='gray',main='Log of Control Group Rainfall', xlab='log of acre-ft')
hist(log(tre$rainfall),col='gray',main='Log Treatment Group Rainfall', xlab='log of acre-ft')
qqnorm(log(con$rainfall)); qqline(log(con$rainfall))
qqnorm(log(tre$rainfall)); qqline(log(tre$rainfall))

With these reasonably large samples, and with data that is much closer to normal, it makes sense to apply a 2-sample t-test and t-confidence interval to the log-transformed data.

t.test(log(tre$rainfall),log(con$rainfall))

## 
##  Welch Two Sample t-test
## 
## data:  log(tre$rainfall) and log(con$rainfall)
## t = 2.5444, df = 49.966, p-value = 0.01408
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2408498 2.0467125
## sample estimates:
## mean of x mean of y 
##  5.134187  3.990406