Statistics: Topic 2 | Graphical Representation

📘 Topic 2: Graphical Representation — Bringing Data to Life!
(Bar Charts, Histograms, Box Plots, and Cumulative Frequency Graphs)

Level: IB AA SL/HL, A-Level, Further Math
Tags: Descriptive Statistics, Data Visualization, Graphical Analysis, Interpreting Graphs

👋 Welcome, Future Data Wizards!
Visualizing data is like telling a story with pictures. Instead of just numbers, graphs help us see patterns, understand trends, and spot anything unusual. This topic will equip you to create and interpret some of the most common and powerful statistical graphs. Let’s dive in!

🎯 Learning Objectives
By the end of this topic, you’ll be a pro at:

  1. Distinguishing between Bar Charts and Histograms and knowing exactly when to use each.
  2. Constructing clear and accurate Bar Charts and Histograms (including those with unequal class widths!).
  3. Mastering Box-and-Whisker Plots: drawing them and using them to understand data spread and identify outliers.
  4. Unlocking the secrets of Cumulative Frequency Graphs to estimate medians, quartiles, and other percentiles.
  5. Confidently interpreting these graphs to draw meaningful conclusions about datasets.

1️⃣ Bar Charts vs. Histograms: What’s the Difference?

These two often get mixed up, but they tell very different stories about different types of data.

FeatureBar ChartHistogram
Data Type UsedCategorical (e.g., favorite color, type of pet) or Discrete (e.g., number of siblings, with distinct values)Continuous (grouped into intervals, e.g., height, weight, time) or Discrete with many values (grouped)
Gaps Between Bars?Yes! Bars are separate to show distinct categories.No! Bars touch to show a continuous scale of data. (A gap only appears if an interval has zero frequency).
X-Axis LabelsNames of categories (e.g., “Red,” “Blue,” “Dog,” “Cat”)Numerical class intervals (e.g., “150-160 cm,” “10-20 mins”)
Y-Axis RepresentsFrequency (or Relative Frequency) of each categoryFrequency (or Frequency Density if class widths are unequal)
Example Use CaseShowing how many students prefer apples, bananas, or oranges.Showing the distribution of student heights in a class.

Think of it this way:

  • 🔹 Bar Charts are for counting things in separate boxes or categories (like sorting M&Ms by color).
  • 🔹 Histograms are for showing how data is spread out across a continuous range (like measuring how many people fall into different height ranges).

📊 Bar Charts in More Detail:

  • Purpose: To compare the frequencies or relative frequencies of different categories.
  • Construction:
    1. Draw a horizontal x-axis (for categories) and a vertical y-axis (for frequency).
    2. Label your axes clearly.
    3. For each category, draw a bar. The height of the bar corresponds to the frequency of that category.
    4. Ensure all bars have the same width and that there are equal gaps between them.
  • Types:
    • Simple Bar Chart: As described above.
    • Comparative/Clustered Bar Chart: Used to compare categories across different groups (e.g., favorite sports for boys vs. girls, with bars clustered by sport).
    • Stacked Bar Chart: Used to show how a whole category is divided into parts (e.g., total sales per quarter, with each bar stacked by product type).

📈 Histograms in More Detail (Especially with Unequal Class Widths):

  • Purpose: To display the underlying frequency distribution (shape, center, spread) of a continuous data set.
  • Key Idea: With histograms, the AREA of the bar represents the frequency.
    • If all class widths are equal, then the height of the bar is directly proportional to the frequency.
    • If class widths are unequal, we MUST use Frequency Density on the y-axis to maintain this area-frequency relationship.

📌 Histogram Frequency Density (Super Important!)
When class intervals (widths) are different, simply plotting frequency on the y-axis can be misleading because wider bars will look “bigger” even if they don’t represent proportionally more data.
To fix this, we calculate:
[ \text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} ]

  • Class Width: Upper boundary of the class – Lower boundary of the class.
  • Y-Axis: Plot Frequency Density.
  • X-Axis: Plot the class intervals.
  • Result: The area of each bar (Frequency Density × Class Width) will now correctly equal the Frequency of that class.
  • Construction (with potentially unequal widths):
    1. Determine your class intervals.
    2. Calculate the class width for each interval.
    3. Calculate the frequency density for each interval.
    4. Draw a horizontal x-axis (for class intervals) and a vertical y-axis (for frequency density).
    5. Label your axes clearly.
    6. For each interval, draw a bar where the width matches the class interval and the height matches the frequency density. Bars should touch.
  • Interpreting Shape:
    • Symmetrical (Bell-shaped): Data is evenly spread around the center.
    • Skewed Right (Positively Skewed): Long tail to the right. More lower values.
    • Skewed Left (Negatively Skewed): Long tail to the left. More higher values.
    • Unimodal/Bimodal/Multimodal: Number of peaks.

2️⃣ Box-and-Whisker Plots (Box Plots): The Five-Number Summary Snapshot!

Box plots are fantastic for getting a quick overview of your data’s spread and central tendency. They clearly show the five-number summary:

  1. Min: Smallest data value (that isn’t an outlier).
  2. Q1 (Lower Quartile): The 25th percentile – 25% of data is below this value.
  3. Median (Q2): The 50th percentile – the middle value.
  4. Q3 (Upper Quartile): The 75th percentile – 75% of data is below this value (or 25% is above).
  5. Max: Largest data value (that isn’t an outlier).

Key Features Visualized:

  • Central Tendency: The Median line inside the box.
  • Spread:
    • Interquartile Range (IQR): The length of the box (IQR=Q3−Q1IQR=Q3​−Q1​). This shows the spread of the middle 50% of the data and is robust to outliers.
    • Overall Range: The distance between the ends of the whiskers (or outliers if present).
  • Symmetry/Skewness:
    • If the median is in the middle of the box and whiskers are roughly equal, the distribution is likely symmetrical.
    • If the median is closer to Q1 and the right whisker is longer, it suggests positive skew.
    • If the median is closer to Q3 and the left whisker is longer, it suggests negative skew.
  • Outliers: Data points that lie unusually far from the main body of the data.

📌 Identifying Outliers (A Common Rule):
A data point 

xx

 is often considered an outlier if:
[ x < Q_1 – 1.5 \times \text{IQR} \quad \text{(Lower Outlier)} ]
OR
[ x > Q_3 + 1.5 \times \text{IQR} \quad \text{(Upper Outlier)} ]

  • The “whiskers” on a box plot typically extend to the smallest and largest data values that are within these outlier boundaries.
  • Any data points beyond these boundaries are plotted individually as dots or asterisks.

How to Draw a Box Plot (Step-by-Step):

  1. Calculate the five-number summary (Min, Q1, Median, Q3, Max). If you need to check for outliers first, find Q1, Q3, and IQR.
  2. Draw a number line (scale) that covers the range of your data.
  3. Draw a box from Q1 to Q3.
  4. Draw a vertical line inside the box at the Median.
  5. Calculate the outlier boundaries: Q1−1.5×IQRQ1​−1.5×IQR and Q3+1.5×IQRQ3​+1.5×IQR.
  6. Determine your “Min” and “Max” for the whiskers:
    • The left whisker extends from Q1 to the smallest data point that is greater than or equal to the lower outlier boundary.
    • The right whisker extends from Q3 to the largest data point that is less than or equal to the upper outlier boundary.
  7. Plot any data points that fall outside these whisker limits as individual points (outliers).

3️⃣ Cumulative Frequency Graphs (Ogives): The “How Many Less Than?” Story

A cumulative frequency graph shows the total number (or percentage) of data points that fall below a certain value.

  • “Cumulative” means “accumulating” or “adding up.”

How to Construct a Cumulative Frequency Graph:

  1. Start with a grouped frequency table.
  2. Add a “Cumulative Frequency” column. For each class, add its frequency to the cumulative frequency of the previous class.
  3. Add an “Upper Class Boundary” column.
  4. Plot points:
    • X-coordinate: Upper class boundary of each interval.
    • Y-coordinate: Cumulative frequency for that interval.
    • Starting Point: Always include a point at the lower boundary of the first class with a cumulative frequency of 0. This anchors your graph.
  5. Connect the plotted points with a smooth curve or straight line segments. The graph should generally be S-shaped (an “ogive”).
  6. The y-axis goes up to the total frequency (N).

Estimating Values from a Cumulative Frequency Graph:
Let N be the total frequency.

  • Median (Q2):
    1. Find N22N on the y-axis (cumulative frequency).
    2. Draw a horizontal line from this point to the curve.
    3. Draw a vertical line from where it hits the curve down to the x-axis.
    4. Read the value – this is your estimated median.
  • Lower Quartile (Q1):
    1. Find N44N on the y-axis.
    2. Repeat the process (horizontal to curve, vertical to x-axis). This is Q1.
  • Upper Quartile (Q3):
    1. Find 3N443N on the y-axis.
    2. Repeat the process. This is Q3.
  • Interquartile Range (IQR): Calculate IQR=Q3−Q1IQR=Q3​−Q1​ using your estimated values.
  • Percentiles: To find the kthkth percentile, find k100×N100k​×N on the y-axis and read across and down.
  • Number of items above/below a value: To find how many items are below a certain x-value, go up from that x-value to the curve, then across to the y-axis. To find how many are above, subtract this from N.

✅ SOLVED EXAMPLE 1: Box Plot Construction & Outlier Check

You are given the following five-number summary for a dataset of exam scores:
Min = 42, Q1 = 50, Median = 58, Q3 = 66, Max = 80.

a) Draw the box plot.
b) Calculate the IQR.
c) A new student scores 30. Would this score be considered an outlier based on the 1.5 × IQR rule?

Solution:
a) Drawing the Box Plot:
(Imagine a number line from about 40 to 85)

  1. Box: Draw a box extending from Q1 (50) to Q3 (66).
  2. Median Line: Draw a vertical line inside the box at the Median (58).
  3. Whiskers:
    • Since we’re given Min=42 and Max=80 and no information to suggest they are beyond typical outlier boundaries for this dataset initially, we assume the whiskers extend to these points.
    • Left whisker: from Q1 (50) down to Min (42).
    • Right whisker: from Q3 (66) up to Max (80).
    (A complete box plot would be drawn here on a scaled axis).

b) Calculate the IQR:

IQR=Q3−Q1=66−50=16IQR=Q3​−Q1​=66−50=16

c) Check if 30 is an outlier:

  1. Calculate outlier boundaries:
    • Lower Boundary: Q1−1.5×IQR=50−1.5×16=50−24=26Q1​−1.5×IQR=50−1.5×16=50−24=26.
    • Upper Boundary: Q3+1.5×IQR=66+1.5×16=66+24=90Q3​+1.5×IQR=66+1.5×16=66+24=90.
  2. Compare the score of 30 to these boundaries:
    • Is 30<2630<26? No, 30>2630>26.
    • Is 30>9030>90? No, 30<9030<90.
      Since 30 is not less than the lower boundary (26) and not greater than the upper boundary (90), 30 is NOT an outlier according to this rule. (It would, however, become the new minimum if added to the original dataset, potentially shifting Q1).

✅ SOLVED EXAMPLE 2: Histogram with Unequal Class Widths

The table shows the heights (in cm) of a group of plants:
| Class (Height, cm) | Frequency (f) | Class Width (w) | Frequency Density (f/w) |
| :—————– | :———— | :————– | :———————- |
| 140 ≤ h < 150 | 5 | 10 | 5 / 10 = 0.5 |
| 150 ≤ h < 160 | 12 | 10 | 12 / 10 = 1.2 |
| 160 ≤ h < 170 | 18 | 10 | 18 / 10 = 1.8 |
| 170 ≤ h < 190 | 15 | 20 | 15 / 20 = 0.75 |

Constructing the Histogram:

  1. X-axis: Mark class intervals (140, 150, 160, 170, 190).
  2. Y-axis: Scale for Frequency Density (e.g., from 0 up to 2.0).
  3. Draw Bars (Height = Frequency Density, Width = Class Width):
    • 140-150: Width 10, Height 0.5
    • 150-160: Width 10, Height 1.2
    • 160-170: Width 10, Height 1.8
    • 170-190: Width 20, Height 0.75 (Notice this bar is wider but shorter than the previous one, accurately reflecting its frequency over a larger range).
      Bars must touch.

(A sketch of the histogram would be shown here, emphasizing the different widths and heights).
The area of the last bar is 

20×0.75=1520×0.75=15

, which is its frequency!


🧪 PRACTICE QUESTIONS: Test Your Skills!

🟡 Easy: Bar Chart Basics
A survey asked students their favorite primary color. The results were: Red – 5 students, Blue – 7 students, Green – 3 students.

  • Draw a simple bar chart to represent this data. Label your axes clearly.

🟠 Medium: Reading a Cumulative Frequency Graph
(Imagine an S-shaped cumulative frequency graph. X-axis: “Time taken (minutes)” from 0 to 60. Y-axis: “Cumulative Frequency” from 0 to 80 students.)
Assume the graph shows the following:

  • At 40 students (N/2), the time is 32 minutes.
  • At 20 students (N/4), the time is 25 minutes.
  • At 60 students (3N/4), the time is 45 minutes.
  1. Estimate the median time taken.
  2. Estimate the lower quartile (Q1) and upper quartile (Q3).
  3. Calculate the interquartile range (IQR).

🔴 Hard (IB/A-Level Style): Histogram and Box Plot Link
You are given a histogram showing the scores of 100 students in a test. The histogram has varying class widths.
(Imagine a histogram, you’d first need to calculate frequency densities if not given, then frequencies for each class. From these, you could create a cumulative frequency table/graph to estimate Q1, Median, Q3. Min/Max would be the boundaries of the first/last classes with data).

  1. Explain how you would estimate the modal class from this histogram.
  2. Describe the steps you would take to estimate the five-number summary (Min, Q1, Median, Q3, Max) needed to draw a box plot for this data, using the information from the histogram. (You don’t need to draw it, just outline the process).

🧠 COMMON MISTAKES TO AVOID!

  • Bar Charts with No Gaps: Remember, bar charts are for distinct categories, so bars should be separate! Histograms have touching bars.
  • Histograms: Frequency vs. Frequency Density: If class widths are unequal, you must use frequency density on the y-axis. Otherwise, the graph is misleading.
  • Cumulative Frequency: Misreading Quartiles: Ensure you’re reading from N/4, N/2, and 3N/4 on the cumulative frequency (y-axis) first, then going across and down.
  • Cumulative Frequency: Plotting Points: Always plot cumulative frequency against the upper class boundary. Start at (lower boundary of first class, 0).
  • Box Plots Show Mean: Nope! Box plots clearly show the Median, Q1, and Q3. The mean is not directly visible.
  • Interpreting Whiskers: Whiskers don’t always go to the absolute Min/Max if outliers are present and the 1.5xIQR rule is used for outlier definition.

✨ Ready to Master Your Data Visualization Skills? ✨

Understanding these graphs is a cornerstone of statistical analysis. Practice drawing them, interpreting them, and you’ll be well on your way to acing your exams and understanding the world through data!

Want to take your Math skills to the next level with personalized guidance?
🚀 Apply for Mentorship with Math by Rishabh! 🚀
Get expert support from Rishabh, an alumnus of the prestigious IIT Guwahati and Indian Statistical Institute. Whether you’re aiming for top grades in IB/A-Levels or building a strong foundation for university, Rishabh’s insights and tailored approach can help you unlock your full potential.
Don’t just learn math, understand it deeply!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top