🔍 Chi-Square Test — Understanding Independence, Association & Goodness of Fit

Author: Rishabh Kumar (IIT + ISI | Global Math Mentor)**
Published: October 2025
Category: Statistics | Hypothesis Testing


🔹 What Is the Chi-Square Test?

The Chi-Square Test (χ² Test) is a statistical method used to test whether observed data fits an expected distribution or whether two categorical variables are related.

It’s one of the simplest and most widely used non-parametric tests — meaning it doesn’t assume a normal distribution of data.

The test compares observed frequencies (actual data) with expected frequencies (theoretical model or assumption).


🔹 When Do We Use the Chi-Square Test?

There are two main types:

TypePurposeExample
Chi-Square Goodness of FitTests how well observed data fits a known distributionAre dice fair?
Chi-Square Test of IndependenceTests if two categorical variables are relatedDoes gender affect subject choice?

📊 1️⃣ Chi-Square Goodness of Fit Test

Used when we want to test if a set of observed categorical data fits a particular theoretical distribution.


🔹 Formula

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}χ2=∑E(O−E)2​

Where:

  • OOO = Observed frequency
  • EEE = Expected frequency

🔹 Steps

1️⃣ State hypotheses

  • H0H_0H0​: Data fits the expected distribution.
  • H1H_1H1​: Data does not fit the expected distribution.

2️⃣ Compute expected frequencies using probabilities or proportions.

3️⃣ Calculate χ2\chi^2χ2 statistic using the formula.

4️⃣ Find degrees of freedom (df): df=number of categories−1df = \text{number of categories} – 1df=number of categories−1

5️⃣ Compare calculated χ2\chi^2χ2 with critical value from Chi-Square table.

6️⃣ Make conclusion:

  • If χcalc2>χcrit2\chi^2_{\text{calc}} > \chi^2_{\text{crit}}χcalc2​>χcrit2​, reject H0H_0H0​.
  • Otherwise, fail to reject H0H_0H0​.

🔹 Example — Testing a Fair Die

A die is rolled 60 times, and the results are:

Face123456
Observed (O)8910111210

If the die is fair, expected frequency E=10E = 10E=10 for each face. χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}χ2=∑E(O−E)2​ =(8−10)210+(9−10)210+(10−10)210+(11−10)210+(12−10)210+(10−10)210= \frac{(8-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(10-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(10-10)^2}{10}=10(8−10)2​+10(9−10)2​+10(10−10)2​+10(11−10)2​+10(12−10)2​+10(10−10)2​ =4+1+0+1+4+010=1.0= \frac{4+1+0+1+4+0}{10} = 1.0=104+1+0+1+4+0​=1.0

Degrees of freedom df=6−1=5df = 6 – 1 = 5df=6−1=5.

At 5% significance level, χcrit2=11.07\chi^2_{\text{crit}} = 11.07χcrit2​=11.07.

✅ Since 1.0<11.071.0 < 11.071.0<11.07, we fail to reject H0H_0H0​.
→ The die appears to be fair.


🔶 2️⃣ Chi-Square Test of Independence

Used when we want to test whether two categorical variables are related.


🔹 Formula

χ2=∑(O−E)2E,E=(row total)(column total)grand total\chi^2 = \sum \frac{(O – E)^2}{E}, \quad E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}χ2=∑E(O−E)2​,E=grand total(row total)(column total)​


🔹 Example — Gender vs. Subject Choice

MathEconomicsTotal
Male402060
Female103040
Total5050100

Compute expected frequencies (E): EMale,Math=(Row Total)(Column Total)Grand Total=60×50100=30E_{Male,Math} = \frac{(Row\,Total)(Column\,Total)}{Grand\,Total} = \frac{60×50}{100} = 30EMale,Math​=GrandTotal(RowTotal)(ColumnTotal)​=10060×50​=30

Similarly:

MathEconomicsTotal
Male303060
Female202040

χ2=(40−30)230+(20−30)230+(10−20)220+(30−20)220\chi^2 = \frac{(40-30)^2}{30} + \frac{(20-30)^2}{30} + \frac{(10-20)^2}{20} + \frac{(30-20)^2}{20}χ2=30(40−30)2​+30(20−30)2​+20(10−20)2​+20(30−20)2​ χ2=10030+10030+10020+10020=3.33+3.33+5+5=16.66\chi^2 = \frac{100}{30} + \frac{100}{30} + \frac{100}{20} + \frac{100}{20} = 3.33 + 3.33 + 5 + 5 = 16.66χ2=30100​+30100​+20100​+20100​=3.33+3.33+5+5=16.66

Degrees of freedom: df=(rows−1)(columns−1)=(2−1)(2−1)=1df = (rows – 1)(columns – 1) = (2-1)(2-1) = 1df=(rows−1)(columns−1)=(2−1)(2−1)=1

At 5% level, χcrit2=3.84\chi^2_{crit} = 3.84χcrit2​=3.84.
✅ Since 16.66>3.8416.66 > 3.8416.66>3.84, we reject H0H_0H0​.
→ There is a significant relationship between gender and subject choice.


📘 3️⃣ Chi-Square Test Assumptions

✅ Data are frequencies (counts), not percentages.
✅ Each observation is independent.
✅ Expected frequency E≥5E \ge 5E≥5 for reliability.
✅ Data are from random sampling.


🔹 Common Mistakes

  1. ❌ Using percentages instead of frequencies.
  2. ❌ Forgetting to subtract 1 in degrees of freedom.
  3. ❌ Ignoring small expected frequencies (<5).
  4. ❌ Using Chi-Square for numerical (non-categorical) data.

🔹 Relation to Other Tests

TestTypeUse Case
Chi-SquareNon-parametricCategorical data
t-testParametricMean comparison
ANOVAParametricGroup mean comparison
Correlation/RegressionParametricContinuous data relationship

🌟 Why It Matters

The Chi-Square test helps us quantify categorical relationships — transforming intuition (“it seems related”) into statistical evidence.

It’s widely used in:

  • Psychology, sociology, and biology,
  • Business and market research,
  • Education and behavioral science,
  • Quality control and survey analysis.

📘 Learn Beyond Formulas

At Math By Rishabh, we don’t just memorize tests — we build reasoning behind inference.

In the Mathematics Elevate Mentorship Program, you’ll:
✅ Learn hypothesis testing intuitively,
✅ Understand Chi-Square logic conceptually,
✅ Solve exam-level IB, AP, and A Level problems systematically.

🚀 Turn statistics into strategy.
👉 Book your personalized mentorship session now at MathByRishabh.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top