Chi2 and Fisher's exact test in R

Chi squared test

The Chi-squared test (or just Chi2) is a method to assess whether two categorical variables are significantly correlated with each other. In other words, it answers the question of whether two categorical variables are significantly related. For instance, we want to know whether body weight of mice (normal vs. overweight) is related to the color of mice (black vs. white). With such variables we can create a simple 2×2 table; however, Chi-square test can be used with variables with more than two levels.

We have 740 mice, 367 black, and 373 white. Of these, 90 mice are overweight, and the weight of 630 is normal. 10 black mice, and 10 white mice have not been weighted, so we have 20 empty cells.

You can calculate the Chi2 without using any software, but you can also do it easily using R Studio.

We use the basic R function chisq.test().
The first variable is ‘color‘, the second variable is ‘obese_yes‘, our database is named ‘mice’.

The code looks as follows:

We use correct=FALSE to turn off the Yates’s continuity correction. In a nutshell, the Yates’s correction was used to compensate for deviations from theoretical probability distributions. It was used mostly as a correction for tables with small cell counts. However, currently Fisher’s exact test (which is described below) is considered to be more accurate,and Yates’s correction is no longer used.

As you see, the Chi2 test value is ~ .668, and p value is equal to .413. It means that color of a mouse is not related to its weight.

Fisher's exact test

The purpose of Fisher’s test is the same, to assess whether two categorical variables are related to each other. However, the Chi-square test assumes that the sample size is large and the approximation is based on this assumption, while Fisher’s exact test uses an exact procedure regardless of the sample size. The reason Chi-square test is more popular is simple; it was easier to calculate it without using computers, while Fisher procedure is more complex and took longer to run back then. Nowadays, this argument is no longer valid, therefore, Fisher’s test is recommended.

We will use the following formula:

You may wonder why there is a table() function before the variables. It’s because in R, Fisher’s test uses the table format of the data, so if your variables are not preliminarily saved as table (mine are not), you simply need to make them be seen as table with that function.

In the case of Fisher’s exact test, p value is equal to .432. Therefore, the null hypothesis cannot be rejected (just like in the case of Chi2). We assume that the color of mice cannot predict their obesity in any way.