Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

1.

USING#
SELECT friend_id, e.name AS entree, d.name AS dessert
FROM entrees e
INNER JOIN desserts d USING (friend_id);

USING is helpful for simplifying your join when you are joining two tables on
columns with the same name. In the above example, you have two tables
which are lists of entrees and desserts and the ID of the friend who knows
how to prepare them. If you have offers from multiple friends to come over
for dinner, you want to know what possible combinations of entree and
dessert each friend can make for you before you decide which offer to
accept, so you run this query. USING makes it so you do not have to write
out ON e.friend_id = d.friend_id , but what I find particularly helpful is that
you no longer have to qualify which friend_id you are referring to. This
prevents the ever-frustrating error ERROR: column reference "friend_id" is
ambiguous when you forget to put e. or d. in front of friend_id .

2. COALESCE#
SELECT c.id as client_id, COALESCE(c.email, c.phone) AS contact_method
FROM clients c;

COALESCE takes a list of columns (or other info) and returns the first non-
NULL one. In the example, you have a table of your business's clients and you
want to get a list of them with a singular way to contact them. You prefer to
contact them over email, so if they provide you with an email address, return
the email address. Otherwise, you will settle for returning the phone number
if the email address is not present. Thanks to COALESCE you now just have
one column and you do not need to select both columns individually and
compare them.

3. CASE#
SELECT CASE
WHEN c.country = 'US' THEN c.state
ELSE c.country END AS region
FROM clients c;

CASE operates similarly to if, else if, and else statements. It returns what
comes after THEN for the first WHEN statement that is true. If none of
the WHEN statements are true, it returns what is under the ELSE statement. In
the example, you want to send a gift to each of your clients because of how
much you appreciate them, but first, you want to approximate shipping
costs so you need to find out where they all live. If they don't live in the
United States, you're fine with just calculating the shipping costs based on
the country, but otherwise, you want to know which specific state they live
in. CASE allows you to check the country condition and return the shipping
region based off that check.

4. Semi Joins (EXISTS/IN)#


SELECT u.id as user_id
FROM users u
WHERE EXISTS (
SELECT 1
FROM models m
WHERE m.user_id = u.id
AND m.updated_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
);

SELECT u.id as user_id


FROM users u
WHERE u.id IN (
SELECT m.user_id
FROM models m
WHERE m.updated_at >= CURRENT_TIMESTAMP - INTERVAL '30 days'
);

Semi Joins such as EXISTS and IN allow you to check for the existence of
matching rows in other tables without having to join to the table itself. In
these two examples, which accomplish the same thing, you want to get a list
of your software's users who have worked on one of their models within the
last 30 days so you know who to reach out to for feedback on some of the
newer features. You could just do an INNER JOIN paired
with DISTINCT or GROUP BY to get the same result, but the semi joins provide a
more performant solution since they only have to find one matching row
before they can return while the INNER JOIN finds all the matching rows
which are then filtered down by the DISTINCT or GROUP BY into one row.
Worth noting that if you are joining on a unique column, regular joins are
equally performant to these semi joins.

5. String Pattern Matching (LIKE/ILIKE/~/~*)#


SELECT b.title, b.author
FROM books b
WHERE b.title LIKE '%Pirate%';

SELECT b.title, b.author


FROM books b
WHERE b.title ~ 'Pirate';
SELECT b.title, b.author
FROM books b
WHERE b.title ILIKE '%pirate%';

SELECT b.title, b.author


FROM books b
WHERE b.title ~* 'pirate';

If you need to pattern match on a string, you are provided with quite a few
options. The most performant option available to you is LIKE , which uses the
built-in SQL matching including % for 0 or more characters. You also have
the Postgres-exclusive ~ which has the power of regex behind it, if you need
a more complicated match. Then, you have the case insensitive versions of
both, ILIKE and ~* respectively. For the example, we see four versions of
trying to find a book that mentions "Pirate" in the title, since you want to
read an old school high seas adventure. Personally, I have always loved
the ~* for quick queries where I just need to find something in a table
quickly and the performance is not much of an issue, but I would
recommend using LIKE or ILIKE for production code if possible.

6. UNION (or UNION ALL)#


SELECT s.pricing_id, s.price
FROM snacks s
UNION
SELECT t.pricing_id, t.price
FROM tickets t
UNION
SELECT m.pricing_id, m.price
FROM memberships m
ORDER BY price;

UNION allows you to combine the results of multiple queries into one result
set. In the example, we have a theater which has three types of products that
they store in separate tables due to the different information required for
each product type. They have snacks from the snack bar, tickets for the
shows, and memberships that allow you to support the theater while also
getting discounted prices on other purchases. The theater wants a list of all
the prices of their products along with the ID used by their POS system for
record-keeping purposes. UNION allows them to take the results from all
three tables and bring the distinct ones together. This works as long as
each SELECT returns the same number of columns and they columns have
similar types. If you are willing to not worry about making each row distinct,
you can use UNION ALL instead.

7. FILTER#
SELECT m.id as member_id, COUNT(*) as member_count, COUNT(*) FILTER(WHERE
m.expiration_date > current_date) as active_member_count
FROM members_m;

FILTER gives you the ability to run an aggregate function over a subset of
the overall result set. Let's go back to the theater from the previous example
for this one. Now, they want to know how many total members they have
ever had and how many active members they currently have. To get overall
total, you can just run COUNT , but you can run COUNT again with the
additional FILTER to only get the members you have not hit their expiration
date yet.

8. Windows (OVER)#
WITH info AS (
SELECT b.label, b.price, b.category,
SUM(b.price) OVER (PARTITION BY b.category) as total_category_price
FROM bills b
)
SELECT i.label, i.price, i.category,
(i.price::float / i.total_category_price) * 100 AS percentage
FROM info i

WITH info AS (
SELECT b.label, b.price, b.category,
SUM(b.price) OVER w
FROM bills b
WINDOW w AS (PARTITION BY b.category)
)
SELECT i.label, i.price, i.category,
(i.price::float / i.total_category_price) * 100 AS percentage
FROM info i

Windows provide a variety of new functionalities to you (I'll touch on some


of these in the next couple of points), but one of the things they allow you
to do is have a column call an aggregate function without having to run
a GROUP BY . In these two examples, you see two different ways of defining a
window. One is inline and the other is separated out at the bottom of the
internal query. In both examples, we are looking at a table of our bills and
we want a list of the bills alongside what percentage of our budget for each
billing category (Food, Entertainment, etc.) we are spending on that bill so
we know what will be most beneficial to cancel. The window allows us to
find the total of the category by using PARTITION BY in the same query so we
can just easily calculate the percentage without any joins to multiple
subqueries. Windows also support using ORDER BY which we will see more
examples of coming up.
9. LAG/LEAD#
SELECT us.user_id,
us.log_in_timestamp - LAG(us.log_out_timestamp, 1) OVER (PARTITION BY
us.user_id ORDER BY us.log_in_timestamp)
FROM user_sessions us

LAG gives you the ability to access data from a previous row (as determined
by the window) of the result set. LEAD does the opposite, giving you data
from an upcoming row. The second parameter is how many rows in the past
or future you are traveling to get the information. In the example, you want
to find out how much time a user is putting your software down before they
log back in again and since you save log in and log out times for some
reason, you can do that. You just need to get the log out time from the
previous row determined by the ORDER BY in the window and compare it to
the current log in time. You also want to make sure you are not getting the
times from different users mixed in together so you PARTITION BY the user in
the window as well. You could also do the opposite by using LEAD to get the
log in timestamp from the next row and comparing it to the log out
timestamp from the current row. Overall, I find these helpful when
comparing data sequentially.

10. ROW_NUMBER#
SELECT m.*, ROW_NUMBER() OVER (PARTITION BY m.user_id ORDER BY m.updated_at
DESC) AS rn
FROM models m
HAVING rn >= 2

ROW_NUMBER does basically what you would expect it to do. It assigns each row
in the result set a number based off the window. If you do not define a
window, it will just assign the number based off the ordering of the overall
query. In this example, we only want our users to be able to have two
models so we don't crash our database with too many, so we want to get a
list of all the users' models that are not one of their two most recently
updated in order to delete them. ROW_NUMBER allows us to PARTITION BY the
user and order the numbering based on when the model was updated. Then,
we can use the HAVING clause to filter out all the rows that are one of the two
most recently edited based on the returned row number.

The End#
I hope you find some of this helpful or learned something by reading this. At
the very least, I enjoyed coming up with examples and writing this, so I'll
take that as a win.

You might also like