The t-Test — Comparing Means and Making Statistical Inferences

🧠 Introduction

In statistics, we often want to know whether two groups differ significantly in their means.
For example:

  • Do male and female students score differently in mathematics?
  • Does a new teaching method improve test performance compared to the traditional one?
  • Is the average lifetime of a new bulb different from 1000 hours as claimed?

To answer such questions, we use the t-test, one of the most commonly used inferential tests.


⚙️ What is a t-Test?

A t-test compares the means of one or two groups to determine whether the difference between them is statistically significant or could have occurred by random chance.

It is based on the Student’s t-distribution, introduced by William Sealy Gosset under the pseudonym “Student” in 1908.


📘 When to Use a t-Test

Use a t-test when:

  • The dependent variable is continuous (e.g., marks, weight, time).
  • The data are approximately normal.
  • The sample size is small (n < 30).
  • The population standard deviation (σ) is unknown.

🧾 Types of t-Tests

TypePurposeData Condition
1. One-sample t-testCompare the sample mean with a known or claimed population mean.One sample
2. Independent two-sample t-testCompare the means of two independent groups.Two unrelated samples
3. Paired t-test (Dependent)Compare the means of two related groups (e.g., before-after).Two related samples

⚡ Formulae

1️⃣ One-Sample t-Test

t=Xˉ−μ0s/nt = \frac{\bar{X} – \mu_0}{s / \sqrt{n}}t=s/n​Xˉ−μ0​​

where:

  • Xˉ\bar{X}Xˉ: sample mean
  • μ0\mu_0μ0​: hypothesized population mean
  • sss: sample standard deviation
  • nnn: sample size

2️⃣ Independent Two-Sample t-Test

If variances are assumed equal: t=Xˉ1−Xˉ2sp1n1+1n2t = \frac{\bar{X}_1 – \bar{X}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}t=sp​n1​1​+n2​1​​Xˉ1​−Xˉ2​​

where pooled standard deviation sps_psp​ is: sp=(n1−1)s12+(n2−1)s22n1+n2−2s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 – 2}}sp​=n1​+n2​−2(n1​−1)s12​+(n2​−1)s22​​​


3️⃣ Paired t-Test

t=dˉsd/nt = \frac{\bar{d}}{s_d / \sqrt{n}}t=sd​/n​dˉ​

where:

  • dˉ\bar{d}dˉ: mean of differences
  • sds_dsd​: standard deviation of differences
  • nnn: number of pairs

🎯 Hypotheses

TypeNull Hypothesis (H₀)Alternative Hypothesis (H₁)
One-sampleμ = μ₀μ ≠ μ₀, μ > μ₀, or μ < μ₀
Independentμ₁ = μ₂μ₁ ≠ μ₂, μ₁ > μ₂, or μ₁ < μ₂
Pairedμ_d = 0μ_d ≠ 0, μ_d > 0, or μ_d < 0

📊 Example 1 — One-Sample t-Test

A class of 10 students scored as follows in a test:
75, 78, 72, 70, 69, 82, 80, 68, 74, 77

Test whether the mean score is different from 70 at a 5% significance level.

Step 1:

Xˉ=74.5,s=4.77,n=10\bar{X} = 74.5, \quad s = 4.77, \quad n = 10Xˉ=74.5,s=4.77,n=10

Step 2:

t=74.5−704.77/10=2.99t = \frac{74.5 – 70}{4.77 / \sqrt{10}} = 2.99t=4.77/10​74.5−70​=2.99

Step 3:

df = 9, tcrit=2.262t_{crit} = 2.262tcrit​=2.262 (two-tailed, α = 0.05)

✅ Since 2.99 > 2.262 → Reject H₀
The mean score is significantly different from 70.


📊 Example 2 — Independent Two-Sample t-Test

GroupScoresMeanSDn
A75, 78, 82, 8078.753.304
B68, 72, 70, 7471.02.584

Step 1:

sp=(3.3)2(3)+(2.58)2(3)6=2.96s_p = \sqrt{\frac{(3.3)^2(3) + (2.58)^2(3)}{6}} = 2.96sp​=6(3.3)2(3)+(2.58)2(3)​​=2.96 t=78.75−71.02.9614+14=3.69t = \frac{78.75 – 71.0}{2.96 \sqrt{\frac{1}{4} + \frac{1}{4}}} = 3.69t=2.9641​+41​​78.75−71.0​=3.69

Step 2:

df = 6, tcrit=2.447t_{crit} = 2.447tcrit​=2.447

✅ Since 3.69 > 2.447 → Reject H₀.
The two group means differ significantly.


📊 Example 3 — Paired t-Test

Students’ marks before and after special coaching:

StudentBeforeAfterDifference (d)
170755
268746
372764
471732
569701

dˉ=3.6,sd=2.07,n=5\bar{d} = 3.6, \quad s_d = 2.07, \quad n = 5dˉ=3.6,sd​=2.07,n=5 t=3.62.07/5=3.88t = \frac{3.6}{2.07 / \sqrt{5}} = 3.88t=2.07/5​3.6​=3.88

df = 4, tcrit=2.776t_{crit} = 2.776tcrit​=2.776

✅ Since 3.88 > 2.776 → Reject H₀.
Coaching significantly improved scores.


💻 t-Test in Python

from scipy import stats
import numpy as np

One-sample t-test

data = np.array([75, 78, 72, 70, 69, 82, 80, 68, 74, 77])
t_stat, p_val = stats.ttest_1samp(data, 70)
print(“t-statistic:”, t_stat, “p-value:”, p_val)

Independent two-sample t-test

group1 = np.array([75, 78, 82, 80])
group2 = np.array([68, 72, 70, 74])
t_stat, p_val = stats.ttest_ind(group1, group2)
print(“t-statistic:”, t_stat, “p-value:”, p_val)

Paired t-test

before = np.array([70, 68, 72, 71, 69])
after = np.array([75, 74, 76, 73, 70])
t_stat, p_val = stats.ttest_rel(before, after)
print(“t-statistic:”, t_stat, “p-value:”, p_val)


📈 Assumptions of t-Test

  1. Data are continuous (interval/ratio).
  2. Independence of observations.
  3. The data are approximately normally distributed.
  4. For independent samples: equal variances (for the pooled version).

If variances are unequal → use Welch’s t-test.


⚠️ Common Mistakes

🚫 Using t-test for categorical data.
🚫 Ignoring normality and equal variance assumptions.
🚫 Confusing paired and independent samples.
🚫 Interpreting a non-significant result as “no difference at all” (could be due to small sample size).


🧭 Final Thoughts

The t-test is one of the most fundamental and widely applied statistical tools for comparing means.
It bridges descriptive and inferential statistics and lays the groundwork for more complex methods like ANOVA, Regression, and Machine Learning model testing.

Understanding which t-test to apply, and interpreting its results correctly, is an essential skill for any data analyst, researcher, or student of applied statistics.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top