askvity

How do you make a histogram of grouped data?

Published in Data Visualization 4 mins read

Creating a histogram from grouped data involves organizing the data into bins, counting the frequency of observations within each bin, and then visually representing these frequencies as bars. Here's a step-by-step guide:

1. Prepare Your Grouped Data (Frequency Table):

  • Begin with a grouped frequency table that shows data divided into classes (bins or intervals) along with the corresponding frequencies (the number of data points falling within each class).

    • Example:

      Class Interval Frequency
      10-20 5
      20-30 12
      30-40 8
      40-50 3

2. Determine Bin Size and Boundaries (If Not Already Defined):

  • If your data isn't already grouped, you need to decide on a suitable bin size. A good starting point is to use Sturges' Rule: Number of Bins ≈ 1 + 3.322 * log(n), where 'n' is the total number of data points. However, this is just a guideline; you may need to adjust the bin size based on the nature of your data.
  • Ensure that each bin is of equal width to maintain an accurate visual representation. Unequal bin widths require a different approach (density histograms, see section 5).
  • Define clear upper and lower boundaries for each bin. It is important to use convention to specify which class interval should include data falling on the boundary. For example, the above frequency table states that the interval '10-20' includes values from 10 up to, but not including, 20. The next interval '20-30' would then include 20, and all values up to but not including 30.

3. Set Up the Axes:

  • X-axis (Horizontal): Represents the class intervals or bin boundaries. Label the axis clearly with the variable being measured (e.g., "Age," "Temperature"). Use a consistent scale.
  • Y-axis (Vertical): Represents the frequency (or relative frequency, explained later). Label the axis clearly as "Frequency" or "Relative Frequency." Use a consistent scale.

4. Draw the Bars:

  • For each class interval, draw a rectangle (bar) whose base spans the class interval on the x-axis.
  • The height of the bar corresponds to the frequency of that class interval as read on the y-axis.
  • Crucially, the bars in a histogram touch each other. This visually represents the continuous nature of the data.

5. Consider Relative Frequency (Optional but Recommended):

  • Instead of frequency, you can use relative frequency. Calculate the relative frequency for each class interval by dividing the frequency of that interval by the total number of data points. This transforms frequencies into proportions or percentages.
  • Using relative frequency allows you to compare histograms from datasets of different sizes.
  • If the bins are unequal widths, using frequency produces a distorted histogram. Instead, consider creating a density histogram, where the area of each bar is proportional to the frequency. This means the height of each bar should equal the frequency divided by the bin width.

6. Label and Title:

  • Give your histogram a clear and descriptive title that accurately reflects the data being represented.
  • Label both axes clearly, including units of measurement where applicable.

Example Scenario:

Suppose you have the following grouped data representing the ages of people attending a concert:

Age Group Frequency
15-25 30
25-35 55
35-45 40
45-55 25
55-65 10

To create the histogram:

  1. Your X-axis would represent the age groups (15-25, 25-35, etc.).
  2. Your Y-axis would represent the frequency (number of people).
  3. You would draw bars for each age group, with the height of each bar corresponding to the frequency for that group (e.g., the bar for 15-25 would have a height of 30).
  4. Ensure all bars touch, indicating the continuous nature of age.

By following these steps, you can effectively create a histogram from grouped data, providing a visual representation of the data's distribution.

Related Articles