In the previous lesson, we went over how to pull data from a database using the SELECT statement. Building on that foundation, in this lesson we'll demonstrate how to sort the data pull that resulted from the SELECT statement. We'll also learn how to return the top and bottom values from out dataset.
These concepts will introduce two new SQL operators:
- ORDER BY (ASC & DESC)
To see the results of the queries we'll be writing, open your SQL Client (e.g. TablePlus) and connect to your Local Database that hosts the SFDS Getting Started dataset. Let's get started!
Sorting in Ascending Order
The Google Sheets sorting lesson used the User ID tab as an example data set, and the same dataset can be found in the Users table. As a recap, we can see our entire set from the Users table by running:
SELECT * FROM getting_started.users ;
We can also be more prescriptive and specify which columns we would like to see:
SELECT user_id , signup_timestamp , country , acquisition_source , user_deleted FROM getting_started.users ;
SQL allows us to sort our data by specifying which column we would like to sort using the ORDER BY statement. For example, to sort data from the acquisition_source field in ascending order, we would write:
SELECT user_id , signup_timestamp , country , acquisition_source , user_deleted FROM getting_started.users ORDER BY acquisition_source ASC ;
Note that the relationship between the columns has been maintained, and that ordering of the values in the other column are random. For example, the acquisition_source value of 'Other' is displayed as the top results, but the order of the user_id field is random for the rows where acquisition_source = 'Other'.
Sorting in Descending Order
To change the sorting from ascending to descending, we simply need to replace the ASC in the code to DESC :
SELECT user_id , signup_timestamp , country , acquisition_source , user_deleted FROM getting_started.users ORDER BY acquisition_source DESC ;
Sorting by Multiple Columns
What if we need to sort our data by multiple columns? SQL supports adding multiple arguments to the ORDER BY statement, so we simply need to list the column we would like to sort, and specify the type of desired sorting. For example:
SELECT user_id , signup_timestamp , country , acquisition_source , user_deleted FROM getting_started.users ORDER BY acquisition_source ASC, country DESC ;
There is no limit to the number of columns we can include in the ORDER BY statement.
Limiting the Number of Rows Returned
In all previous examples so far, we've dealt with a very small data set that contains only 100 rows of data. Displaying and sorting the data in a SQL client is straightforward for datasets of this size, but what happens if we have a dataset that contains 1 millions rows? Or 100 million rows? Or more? To be able to sort and display datasets of this magnitude, we first need a way to limit the number of rows to display. The LIMIT statement in SQL does just that:
SELECT user_id , signup_timestamp , country , acquisition_source , user_deleted FROM getting_started.users LIMIT 10 ;
The resulting query will only show 10 rows of data, in no particular order. Limiting the results of a query is actually best practice when exploring data, since it will return results quicker, and also avoid the possibility of returning too many rows for the SQL client to process, resulting in your computer to crash :(.
Finding the Top and Bottom Values From Our Dataset
A common use case for the LIMIT statement is to combine it with the ORDER BY statement to find the top or bottom values from a dataset. For example, to find the bottom 5 smallest values for user_id, we can write:
SELECT user_id FROM getting_started.users ORDER BY user_id ASC LIMIT 5 ;
And to find the last 10 signup_timestamp values, we can write:
SELECT signup_timestamp FROM getting_started.users ORDER BY signup_timestamp DESC LIMIT 10 ;