# Mystery of a 2×2 Contingency Table

**Features of a 2×2 Contingency Table (Part-1)**

To study the association between two categorical variables, Chi-Square test is the most popular statistical test. Anyone who are engaged with data analysis they have to perform this statistical test several times. In this short post, I will highlight one of the most important assumptions of Chi-square test.

“**The cell of a contingency table must be mutually exclusive**”

This is the most important assumption to perform Chi-square test to assess association between two categorical variables. In reality a large number of users (those who perform this statistical test) are not aware of this assumption. Though, in some cases the data are generated in such a way that this condition always satisfied. But in some cases this assumption does not hold. If the assumption is not satisfied by the data in hand, then the conclusion will be invalid. Below is a simple example of a 2×2 contingency table where the mutually exclusive assumption is violated.

Let’s consider, I have collected egg specimen from various species of ducks to test whether H5 antibody present or not. After collecting the data I have prepared the following table:

Table-1

H5 Present (Yes) | H5 Absent (No) | Total | |

Khaki Cambel | a1 | b1 | a1 + b1 |

Any Species | c1 | d1 | c1 + d1 |

Total | a1 + c1 | b1 + d1 | a1 + b1 + c1 + d1 |

By first look at this table we might think that, everything is ok and we can perform Chi-Square test. But *if we look carefully at the row heading “Khaki Cambel” and “Any Species”, here any species could contain the Khaki Cambel too.* And for this reason the **table-1 does not satisfy the mutually exclusive condition**. In fact Khaki Cambel is contributing to both rows. Any conclusion from this table-1 will be invalid.

Now, we can make the mutually exclusive cells by changing the row heading and the cell contents accordingly. So the revised table will be:

Table-2

H5 Present (Yes) | H5 Absent (No) | Total | |

Khaki Cambel | a2 | b2 | a2 + b2 |

Any Species Other than Khaki Cambel |
c2 | d2 | c2 + d2 |

Total | a2 + c2 | b2 + d2 | a2 + b2 + c2 + d2 |

If we notice the row heading of table-2, we see that the second row now excludes “Khaki Cambel” and eventually it satisfy mutually exclusive assumption.

Another important thing from Chi-Square test, we can only talk about the evidence of association between two categorical variables by performing Chi-Square test but we CAN NOT talk about the magnitude and direction of association. To do so we need to quantify the measure of association.

Here is the Stata code to perform Chi-Square test from tabulated data like table-2:

`tabi 30 18 \ 18 14,chi2`

(To be continued…)

Very interesting and useful post! Thanks a lot. Waiting for more posts about the more widely used statistical methods and their applications using R.

Thank you for your comment. In any future posts I will include examples using both Stata and R

Thanks for the important sharing. I am just unable to use the Stata code. Can you kindly explain regarding these issue?

The tabi Stata command works only for the tabulated data. That is you have to have the data in a 2×2 contingency table and then tabi command will work.

Thanks for sharing such an important information. As a non statistician, I would like to know what problem occurs if a cell value is less than 5. Why we need to go for fisher exact? How the fisher exact works to fix this problem.

Thank you for your comment and nice question. I will respond to your questions in a separate post.