Group
Grouping is the process of aggregating data by a variable. You likely already group data inherently on a regular basis. For example, have you ever:
- Tallied up your expenses in a month for a budget?
- Calculated your average calorie intake by breakfast, lunch, and dinner?
- Looked up your highest and lowest grade by semester?
If so, congratulations, as you're already experienced in the process of grouping data! Also, the underlined words above are purposeful.
To group is to aggregate data by ____________ [ fill in the blank with a variable of your choosing ].
Let's look at an example – say you have following dataset:
Month | Category | Sales |
January | Beauty | $100 |
January | Home | $100 |
January | Lifestyle | $100 |
February | Beauty | $100 |
February | Home | $100 |
February | Lifestyle | $1,000 |
Aggregating on the dataset above would result in:
- Total Sales = $1,500
- Average Sale Amount = $250
- Minimum Sale Amount = $100
- Maximum Sale Amount = $1,000
Grouping data is where your brain naturally goes when exploring and aggregating data. For example, what are Total Sales by month (i.e. let's group by month and aggregate for Total Sales):
Month | Total Sales |
January | $300 |
February | $1,200 |
Or, if we'd like to group by category:
Category | Total Sales |
Beauty | $200 |
Home | $200 |
Lifestyle | $1,100 |
Pivot
Pivoting is the process of turning a cell value into a new field/column, and then aggregating by it. For instance if we pivoted our dataset by month (i.e. turn month into a field/column), we could then see them from left-to-right and aggregate by category for Total Sales:
January | February | |
Beauty | $100 | $100 |
Home | $100 | $100 |
Lifestyle | $100 | $1,000 |