Pivoting data in SQL is the combination of two processes:
- Segment data into newly defined fields (i.e. create your pivots)
- Aggregate data within the newly defined fields (i.e. group with new pivots)
Let's use the same example from the Building a Pivot Table in Google Sheets lesson:
Item | Description | Example (see screenshot below) |
Row | The field you want to see as rows | Place each "Country" value in a row |
Column | The field you want to see as columns | Place each "Acquisition Source" value as a column |
Value | The field you want to aggregate | Aggregate for the Count of Unique Users |
Filter | The data you want to exclude | Exclude any User that's deleted their account |
Let's break this down into steps. First, let's grab country as our rows and create new columns for each Acquisition Source.
For each row in our Getting Started: Users dataset, when the acquisition source is equal to the output of our CASE WHEN statement, jot down the user_id. If it's not equal, leave it NULL.
SELECT
country
, CASE WHEN acquisition_source = 'Other' THEN user_id ELSE NULL END AS other
, CASE WHEN acquisition_source = 'Referral' THEN user_id ELSE NULL END AS referral
, CASE WHEN acquisition_source = 'Search Engine' THEN user_id ELSE NULL END AS search
, CASE WHEN acquisition_source = 'Social Media' THEN user_id ELSE NULL END AS social
FROM getting_started.users
;
Which will give us the following output:
From here, let's add in our filter: exclude any users that have deleted their account:
SELECT
country
, CASE WHEN acquisition_source = 'Other' THEN user_id ELSE NULL END AS other
, CASE WHEN acquisition_source = 'Referral' THEN user_id ELSE NULL END AS referral
, CASE WHEN acquisition_source = 'Search Engine' THEN user_id ELSE NULL END AS search
, CASE WHEN acquisition_source = 'Social Media' THEN user_id ELSE NULL END AS social
FROM getting_started.users
WHERE user_deleted = FALSE
;
This takes us from 100 rows (i.e. the entire Getting Started: Users dataset), down to 92 (i.e. 8 users have deleted their account and we're now excluding those rows).
Our final step is to add our "Value" from the table example above - how and what we want to aggregate by. We want to aggregate by country (how), aggregate by our newly created segmentation fields for the count of unique users (what).
So, let's add-in our SQL aggregate logic and our GROUP BY statement:
-- Uniquely count the number of User IDs where that user's acquisition_source = the output of the CASE WHEN statement
-- Add in our GROUP BY logic so that our database software knows what to aggregate by (i.e. country)
SELECT
country
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Other' THEN user_id ELSE NULL END) AS other_user_count
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Referral' THEN user_id ELSE NULL END) AS referral_user_count
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Search Engine' THEN user_id ELSE NULL END) AS search_engine_user_count
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Social Media' THEN user_id ELSE NULL END) AS social_media_user_count
FROM getting_started.users
WHERE user_deleted = FALSE
GROUP BY 1
;
Which will bring back the following results:
We could also add-in a field to aggregate for the total number of unique users by country (excluding any segmentation). This is a great way to reconcile (i.e. check the math works out) the output of our CASE WHEN aggregate logic:
-- Uniquely count the number of User IDs where that user's acquisition_source = the output of the CASE WHEN statement
-- Add in our GROUP BY logic so that our database software knows what to aggregate by (i.e. country)
SELECT
country
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Other' THEN user_id ELSE NULL END) AS other_user_count
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Referral' THEN user_id ELSE NULL END) AS referral_user_count
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Search Engine' THEN user_id ELSE NULL END) AS search_engine_user_count
, COUNT(DISTINCT CASE WHEN acquisition_source = 'Social Media' THEN user_id ELSE NULL END) AS social_media_user_count
, COUNT(DISTINCT user_id) AS total_user_count -- Newly added total field to verify that our logic above is correct
FROM getting_started.users
WHERE user_deleted = FALSE
GROUP BY 1
;
Which produces the following output: