Aggregating data by a dimension in SQL is technically straightforward, but for many, is a difficult concept to fully grasp. So, let's dive right into an example. Say we ask of the Getting Started: Users dataset:
How many users have signed up via the Social Media acquisition source?
If we start with a simply query to extract this data, we can manually count the number of unique users:
SELECT *
FROM getting_started.users
WHERE acquisition_source = 'Social Media'
;
We discover the answer is: 23
We could even use an aggregate function to get the answer directly:
SELECT COUNT(DISTINCT user_id) AS num_unique_user
FROM getting_started.users
WHERE acquisition_source = 'Social Media'
;
But, what if we wanted to also show the acquisition source alongside the answer? In other words, how could we group our data by acquisition source and aggregate within that grouping?
We'll use the GROUP BY function in SQL to do exactly this! As an example:
SELECT
acquisition_source
, COUNT(DISTINCT user_id) AS num_unique_user
FROM getting_started.users
WHERE acquisition_source = 'Social Media'
GROUP BY acquisition_source
;
You could also use, what we call, a field index alias instead of writing out the entire field name in the GROUP BY statement:
SELECT
acquisition_source
, COUNT(DISTINCT user_id) AS num_unique_user
FROM getting_started.users
WHERE acquisition_source = 'Social Media'
GROUP BY 1 -- GROUP BY the 1st field in our SELECT statement
;
And now, for the big reveal! If we eliminate our WHERE clause we can see the true power of GROUP BY:
SELECT
acquisition_source
, COUNT(DISTINCT user_id) AS num_unique_user
FROM getting_started.users
GROUP BY 1
;
We're able to see the number of unique user signups for each acquisition source all in one query.
Grouping by Multiple Fields
We can also GROUP BY multiple fields. For example, if we wanted to count the number of unique users by Country and Acquisition Source:
SELECT
acquisition_source
, country
, COUNT(DISTINCT user_id) AS num_unique_user
FROM getting_started.users
GROUP BY acquisition_source, country
;
-- The query below will product the same result as above
-- Use a field index alias to simplify the query above
SELECT
acquisition_source
, country
, COUNT(DISTINCT user_id) AS num_unique_user
FROM getting_started.users
GROUP BY 1,2
;