It took me quite some time before I could reason efficiently about SQL JOINs. And I must admit, the set theory diagrams never helped me really understand JOINs in SQL. So today, I want to help you understand JOINs in a different way, hoping to make the concept click at once for you too!
As we saw in the previous article What is an SQL relation?, in SQL a relation is a collection of objects, all sharing the same definition. SQL introduces relations and operators to compose them.
In mathematics, you can define a new function H
as being the result of
applying F
to some object, and then G
to the result of the previous
operation. We then note that H = F ∘ G
and it means that H(x) = F(G(x))
.
Functional programing languages all have a way to express this kind of
composition. This allows to build a new function from simpler ones.
In SQL we compose relations: we build a new relation from two existing ones. Remember, a relation is a collection of objects. How would you compose two collections of objects together to form a new collection of objects?
There aren’t that many ways to do it. Both input collections have objects of certain properties, or attributes. You typically want to build a new collection of objects that would have properties from the two collections, right?
That’s it. That’s a SQL JOIN.
Let’s see an example, taken from my book The Art of PostgreSQL:
-- name: list-albums-by-artist
-- List the album titles and duration of a given artist
select album.title as album,
sum(milliseconds) * interval '1 ms' as duration
from album
join artist using(artistid)
left join track using(albumid)
where artist.name = 'Red Hot Chili Peppers'
group by album
order by album;
In this query we build a new relation that is a composition of the objects
found in the collection ALBUM
with the objects found in the collection
ARTIST
. The result is a new collection of objects where we have both the
properties of the album and the artist, and we enrich albums with the artist
of the same artistid
.
This JOIN
that you see in the query, that’s all it means. For each album,
we add the information from the artist that shares the same artistid
. In
that case, we expect a single artist per album, so that’s even easier.
Next, we have a LEFT JOIN
that composes this new relation with the
collection of track objects. We build a new collection of objects that have
all the properties of the ALBUM
, the ARTIST
and the TRACK
.
I will dive in the other parts of the query in other articles! Here’s the new relation we have defined with the SQL query above:
┌───────────────────────┬──────────────────────────────┐
│ album │ duration │
├───────────────────────┼──────────────────────────────┤
│ Blood Sugar Sex Magik │ @ 1 hour 13 mins 57.073 secs │
│ By The Way │ @ 1 hour 8 mins 49.951 secs │
│ Californication │ @ 56 mins 25.461 secs │
└───────────────────────┴──────────────────────────────┘
(3 rows)
Check out my book The Art of PostgreSQL where you can learn how to put SQL to good use when coding your application!
Subscribe to receive a FREE chapter of the second edition of my book, “The Art of PostgreSQL” including the full Table of Contents!