Mystery of a 2×2 Contingency Table

Home  >>  Statistics  >>  Mystery of a 2×2 Contingency Table

Mystery of a 2×2 Contingency Table



Features of a 2×2 Contingency Table (Part-1)

To study the association between two categorical variables, Chi-Square test is the most popular statistical test. Anyone who are engaged with data analysis they have to perform this statistical test several times. In this short post, I will highlight one of the most important assumptions of Chi-square test.

The cell of a contingency table must be mutually exclusive

This is the most important assumption to perform Chi-square test to assess association between two categorical variables. In reality a large number of users (those who perform this statistical test) are not aware of this assumption. Though, in some cases the data are generated in such a way that this condition always satisfied. But in some cases this assumption does not hold. If the assumption is not satisfied by the data in hand, then the conclusion will be invalid. Below is a simple example of a 2×2 contingency table where the mutually exclusive assumption is violated.

Let’s consider, I have collected egg specimen from various species of ducks to test whether H5 antibody present or not. After collecting the data I have prepared the following table:


H5 Present (Yes) H5 Absent (No) Total
Khaki Cambel a1 b1 a1 + b1
Any Species c1 d1 c1 + d1
Total a1 + c1 b1 + d1 a1 + b1 + c1 + d1

By first look at this table we might think that, everything is ok and we can perform Chi-Square test. But if we look carefully at the row heading “Khaki Cambel” and “Any Species”, here any species could contain the Khaki Cambel too. And for this reason the table-1 does not satisfy the mutually exclusive condition. In fact Khaki Cambel is contributing to both rows. Any conclusion from this table-1 will be invalid.

Now, we can make the mutually exclusive cells by changing the row heading and the cell contents accordingly. So the revised table will be:


H5 Present (Yes) H5 Absent (No) Total
Khaki Cambel a2 b2 a2 + b2
Any Species Other than Khaki Cambel c2 d2 c2 + d2
Total a2 + c2 b2 + d2 a2 + b2 + c2 + d2

If we notice the row heading of table-2, we see that the second row now excludes “Khaki Cambel” and eventually it satisfy mutually exclusive assumption.

Another important thing from Chi-Square test, we can only talk about the evidence of association between two categorical variables by performing Chi-Square test but we CAN NOT talk about the magnitude and direction of association. To do so we need to quantify the measure of association.

Here is the Stata code to perform Chi-Square test from tabulated data like table-2:

tabi 30 18 \ 18 14,chi2


(To be continued…)

6 Comments so far:

  1. Md. Jahangir Alam says:

    Very interesting and useful post! Thanks a lot. Waiting for more posts about the more widely used statistical methods and their applications using R.

  2. Sajjad says:

    Thanks for the important sharing. I am just unable to use the Stata code. Can you kindly explain regarding these issue?

    • jaynal says:

      The tabi Stata command works only for the tabulated data. That is you have to have the data in a 2×2 contingency table and then tabi command will work.

  3. Thanks for sharing such an important information. As a non statistician, I would like to know what problem occurs if a cell value is less than 5. Why we need to go for fisher exact? How the fisher exact works to fix this problem.

Leave a Reply

Your email address will not be published. Required fields are marked *

Get every new post delivered to your inbox
Join millions of other followers
Powered By