Goodness of Fit Example

Mars candies manufactures M&Ms. According to Mars, the proportions of M&Ms with each color are given in the table below.

In [21]:
colors=c("Brown","Yellow","Red","Orange","Blue","Green")
proportions = c(0.13,0.14,0.13,0.20,0.24,0.16)
A=matrix(proportions,nrow=1)
colnames(A)=colors
rownames(A)=c("Proportions")
A
Out[21]:
BrownYellowRedOrangeBlueGreen
Proportions0.130.140.130.200.240.16

Suppose we observe the following counts in a large bag of M&Ms.

In [22]:
observations = c(61,59,49,77,141,88)
B = matrix(observations,nrow=1)
colnames(B)=colors
rownames(B)=c("Counts")
B
Out[22]:
BrownYellowRedOrangeBlueGreen
Counts 61 59 49 77141 88

First, let's carry out a Goodness of Fit test. The command is below. Notice, this command doesn't need any fancyness with matrices or column names, etc. I just added that stuff above to make things easier to read.

In [18]:
chisq.test(observations, p = proportions)
Out[18]:
	Chi-squared test for given probabilities

data:  observations
X-squared = 15.188, df = 5, p-value = 0.00959

Since the p-value is so small (less than 1%), it seems highly likely that there is a non-random reason why the distribution of colors in the sample does not match the advertised distribution of colors. It could be that our sample was not well mixed, or someone tampered with the sample, or maybe the advertised proportions are not correct.

Approximating Goodness of Fit Using Two-Way Tables

In [49]:
N=100000
A=rbind(observations,N*proportions)
colnames(A)=colors
rownames(A)=c("observations","enlarged proportions")
A
Out[49]:
BrownYellowRedOrangeBlueGreen
observations 61 59 49 77141 88
enlarged proportions130001400013000200002400016000
In [48]:
chisq.test(as.table(A))
Out[48]:
	Pearson's Chi-squared test

data:  as.table(A)
X-squared = 15.113, df = 5, p-value = 0.009889

As long as we scale the proportions by a large factor N, the chi-squared test for two-way tables will be close to the test for goodness of fit.

In [ ]: