Author: Rishabh Kumar (IIT + ISI | Global Math Mentor)**
Published: October 2025
Category: Statistics | Hypothesis Testing
🔹 What Is the Chi-Square Test?
The Chi-Square Test (χ² Test) is a statistical method used to test whether observed data fits an expected distribution or whether two categorical variables are related.
It’s one of the simplest and most widely used non-parametric tests — meaning it doesn’t assume a normal distribution of data.
The test compares observed frequencies (actual data) with expected frequencies (theoretical model or assumption).
🔹 When Do We Use the Chi-Square Test?
There are two main types:
| Type | Purpose | Example |
|---|---|---|
| Chi-Square Goodness of Fit | Tests how well observed data fits a known distribution | Are dice fair? |
| Chi-Square Test of Independence | Tests if two categorical variables are related | Does gender affect subject choice? |
📊 1️⃣ Chi-Square Goodness of Fit Test
Used when we want to test if a set of observed categorical data fits a particular theoretical distribution.
🔹 Formula
χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}χ2=∑E(O−E)2
Where:
- OOO = Observed frequency
- EEE = Expected frequency
🔹 Steps
1️⃣ State hypotheses
- H0H_0H0: Data fits the expected distribution.
- H1H_1H1: Data does not fit the expected distribution.
2️⃣ Compute expected frequencies using probabilities or proportions.
3️⃣ Calculate χ2\chi^2χ2 statistic using the formula.
4️⃣ Find degrees of freedom (df): df=number of categories−1df = \text{number of categories} – 1df=number of categories−1
5️⃣ Compare calculated χ2\chi^2χ2 with critical value from Chi-Square table.
6️⃣ Make conclusion:
- If χcalc2>χcrit2\chi^2_{\text{calc}} > \chi^2_{\text{crit}}χcalc2>χcrit2, reject H0H_0H0.
- Otherwise, fail to reject H0H_0H0.
🔹 Example — Testing a Fair Die
A die is rolled 60 times, and the results are:
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed (O) | 8 | 9 | 10 | 11 | 12 | 10 |
If the die is fair, expected frequency E=10E = 10E=10 for each face. χ2=∑(O−E)2E\chi^2 = \sum \frac{(O – E)^2}{E}χ2=∑E(O−E)2 =(8−10)210+(9−10)210+(10−10)210+(11−10)210+(12−10)210+(10−10)210= \frac{(8-10)^2}{10} + \frac{(9-10)^2}{10} + \frac{(10-10)^2}{10} + \frac{(11-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(10-10)^2}{10}=10(8−10)2+10(9−10)2+10(10−10)2+10(11−10)2+10(12−10)2+10(10−10)2 =4+1+0+1+4+010=1.0= \frac{4+1+0+1+4+0}{10} = 1.0=104+1+0+1+4+0=1.0
Degrees of freedom df=6−1=5df = 6 – 1 = 5df=6−1=5.
At 5% significance level, χcrit2=11.07\chi^2_{\text{crit}} = 11.07χcrit2=11.07.
✅ Since 1.0<11.071.0 < 11.071.0<11.07, we fail to reject H0H_0H0.
→ The die appears to be fair.
🔶 2️⃣ Chi-Square Test of Independence
Used when we want to test whether two categorical variables are related.
🔹 Formula
χ2=∑(O−E)2E,E=(row total)(column total)grand total\chi^2 = \sum \frac{(O – E)^2}{E}, \quad E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}χ2=∑E(O−E)2,E=grand total(row total)(column total)
🔹 Example — Gender vs. Subject Choice
| Math | Economics | Total | |
|---|---|---|---|
| Male | 40 | 20 | 60 |
| Female | 10 | 30 | 40 |
| Total | 50 | 50 | 100 |
Compute expected frequencies (E): EMale,Math=(Row Total)(Column Total)Grand Total=60×50100=30E_{Male,Math} = \frac{(Row\,Total)(Column\,Total)}{Grand\,Total} = \frac{60×50}{100} = 30EMale,Math=GrandTotal(RowTotal)(ColumnTotal)=10060×50=30
Similarly:
| Math | Economics | Total | |
|---|---|---|---|
| Male | 30 | 30 | 60 |
| Female | 20 | 20 | 40 |
χ2=(40−30)230+(20−30)230+(10−20)220+(30−20)220\chi^2 = \frac{(40-30)^2}{30} + \frac{(20-30)^2}{30} + \frac{(10-20)^2}{20} + \frac{(30-20)^2}{20}χ2=30(40−30)2+30(20−30)2+20(10−20)2+20(30−20)2 χ2=10030+10030+10020+10020=3.33+3.33+5+5=16.66\chi^2 = \frac{100}{30} + \frac{100}{30} + \frac{100}{20} + \frac{100}{20} = 3.33 + 3.33 + 5 + 5 = 16.66χ2=30100+30100+20100+20100=3.33+3.33+5+5=16.66
Degrees of freedom: df=(rows−1)(columns−1)=(2−1)(2−1)=1df = (rows – 1)(columns – 1) = (2-1)(2-1) = 1df=(rows−1)(columns−1)=(2−1)(2−1)=1
At 5% level, χcrit2=3.84\chi^2_{crit} = 3.84χcrit2=3.84.
✅ Since 16.66>3.8416.66 > 3.8416.66>3.84, we reject H0H_0H0.
→ There is a significant relationship between gender and subject choice.
📘 3️⃣ Chi-Square Test Assumptions
✅ Data are frequencies (counts), not percentages.
✅ Each observation is independent.
✅ Expected frequency E≥5E \ge 5E≥5 for reliability.
✅ Data are from random sampling.
🔹 Common Mistakes
- ❌ Using percentages instead of frequencies.
- ❌ Forgetting to subtract 1 in degrees of freedom.
- ❌ Ignoring small expected frequencies (<5).
- ❌ Using Chi-Square for numerical (non-categorical) data.
🔹 Relation to Other Tests
| Test | Type | Use Case |
|---|---|---|
| Chi-Square | Non-parametric | Categorical data |
| t-test | Parametric | Mean comparison |
| ANOVA | Parametric | Group mean comparison |
| Correlation/Regression | Parametric | Continuous data relationship |
🌟 Why It Matters
The Chi-Square test helps us quantify categorical relationships — transforming intuition (“it seems related”) into statistical evidence.
It’s widely used in:
- Psychology, sociology, and biology,
- Business and market research,
- Education and behavioral science,
- Quality control and survey analysis.
📘 Learn Beyond Formulas
At Math By Rishabh, we don’t just memorize tests — we build reasoning behind inference.
In the Mathematics Elevate Mentorship Program, you’ll:
✅ Learn hypothesis testing intuitively,
✅ Understand Chi-Square logic conceptually,
✅ Solve exam-level IB, AP, and A Level problems systematically.
🚀 Turn statistics into strategy.
👉 Book your personalized mentorship session now at MathByRishabh.com


