Django, what the JOIN? 🤔

How the ORM turns relational references to JOIN

by Simon Charette

Simon Charette

  • Montréal, Canada
  • Django contributor for ~10 years
    • Mostly to the ORM & migrations
  • Principal Eng. at Zapier

📧 charettes@gmail.com
🐙 github.com/charettes

What’s a SQL JOIN?

  • A JOIN combines rows from two tables based on a related column.
  • Think of it as matching records between tables.
  • There are several types of joins — but the most common ones are:
    • INNER JOIN
    • LEFT OUTER JOIN

What’s a SQL JOIN?

from django.db import models


class Publisher(models.Model):
    name = models.CharField()

class Book(models.Model):
    title = models.CharField()
    publisher = models.ForeignKey(Publisher, null=True)

What’s a SQL JOIN?

>>> publisher = Publisher.objects.create()
>>> Book.objects.create(
...     title="Published book",    
...     publisher=publisher,
... )
>>> Book.objects.create(
...     title="Independent book",
... )

INNER JOIN ↔️

  • Returns only matching records between both tables.
  • If there’s no match, the row doesn’t appear.

INNER JOIN ↔️

SELECT *
FROM book
INNER JOIN publisher ON (publisher.id = book.publisher_id)
id name publisher_id
1 Published book 1

LEFT OUTER JOIN ⬅️

  • Returns all rows from first table + matching rows from the second one.
  • If no match in the first, result will contain NULL for the second's columns.

LEFT OUTER JOIN ⬅️

SELECT *
FROM book
LEFT OUTER JOIN publisher ON (publisher.id = book.publisher_id)
id name publisher_id
1 Published book 1
2 Independent book NULL

INNER vs OUTER summary

  • INNER is the strict intersection between both tables
  • OUTER uses NULL to represent the lack of intersection between both tables

When does the ORM span JOINs?

  • Dunder / double underscore syntax __
  • filter, exclude, F
  • select_related
  • Reverse relationship (related_name) references

INNER vs OUTER rationale

Why is it important to pick one over the other?

  • Accuracy of results
  • Faster query evaluation as strict intersection can only produce <= of results

INNER vs OUTER rules

  • SQL NULL means UNKNOWN
  • You can't be sure two unknowns are equal
SELECT NULL = NULL
-> false

INNER vs OUTER rules

When does Django picks one over the other?

  1. Non-NULL-able relationships are always INNER
  2. Constrained NULL-able relationships are always INNER*
  3. *Except when using expressions with specialized NULL handling
  4. JOINs against OUTER JOIN'ed tables are always OUTER

INNER vs OUTER rule #1

Non-NULL-able relationships are always INNER

  • select_related, filter
  • Reverse relationships are NULL-able
  • Parent models in MTI

INNER vs OUTER rule #1

Non-NULL-able relationships are always INNER

class Room(models.Model):
    number = models.IntegerField()

class Reservation(models.Model):
    room = models.ForeignKey(
        Room, related_name="reservations"
    )

INNER vs OUTER rule #1

Non-NULL-able relationships are always INNER

Reservation.objects.select_related("room")
SELECT *
FROM reservation
INNER JOIN room ON (reservation.room_id = room.id)

INNER vs OUTER rule #1

Non-NULL-able relationships are always INNER

Room.objects.annotate(
    reservation=Latest("reservations"),
)
SELECT *
FROM room
-- Reverse relationships are nullable thus
-- an OUTER JOIN must be used.
LEFT OUTER JOIN reservation ON (reservation.room_id = room.id)
GROUP BY room.id
ORDER BY reservation.id DESC

INNER vs OUTER rule #2

Constrained NULL-able relationships are always INNER*

  • Since NULL != <ANYTHING> then col = <ANYTHING> excludes NULL
  • This makes the promote to the relationship to non-NULL-able
  • Thus rule #1 applies

INNER vs OUTER rule #2

Constrained NULL-able relationships are always INNER*

class Publisher(models.Model):
    name = models.CharField()

class Author(models.Model):
    publisher = models.ForeignKey(
        Publisher, null=True, related_name="authors"
    )

INNER vs OUTER rule #2

Constrained NULL-able relationships are always INNER*

Author.objects.filter(publisher__name="Foo")
SELECT *
FROM author
-- OUTER JOIN is not necessary as
-- NULL != "Foo" and thus the relationship
-- is "promoted" to not-NULLABLE.
-- LEFT OUTER JOIN publisher ON (author.publisher_id = publisher.id)
INNER JOIN publisher ON (author.publisher_id = publisher.id)
WHERE publisher.name = "Foo"

INNER vs OUTER rule #3

*Except when using expressions with specialized NULL handling

Can you think of a few expressions that specialize NULL handling?

INNER vs OUTER rule #3

*Except when using expressions with specialized NULL handling

  • __isnull -> IS NULL
  • Coalesce("nullable_field", ...) -> COALESCE
  • IFNULL

INNER vs OUTER rule #3

*Except when using expressions with specialized NULL handling

class Publisher(models.Model):
    name = models.CharField()

class Author(models.Model):
    publisher = models.ForeignKey(
        Publisher, null=True, related_name="authors"
    )

INNER vs OUTER rule #3

*Except when using expressions with specialized NULL handling

Author.objects.filter(
    Q(publisher__name="Foo") | Q(publisher__name__isnull=True)
)
SELECT author.*
FROM author
LEFT OUTER JOIN publisher ON (author.publisher_id = publisher.id)
WHERE (
    publisher.name = "Foo" OR publisher.name IS NULL
)

INNER vs OUTER rule #3

*Except when using expressions with specialized NULL handling

Author.objects.filter(
    Q(publisher__name="Foo") | IsNull(F("publisher__name"), True)
)
SELECT author.*
FROM author
-- 😬🐛 this should be LEFT OUTER!
INNER JOIN publisher ON (author.publisher_id = publisher.id)
WHERE (
    publisher.name = "Foo" OR publisher.name IS NULL
)

INNER vs OUTER rule #4

JOINs against OUTER JOIN'ed tables are always OUTER

  • Mandatory to preserve NULL rows as otherwise additive INNER would reduce the left-hand-side result set as NULL != <ANYTHING>

INNER vs OUTER rule #4

JOINs against OUTER JOIN'ed tables are always OUTER

class Location(models.Model):
    pass

class Publisher(models.Model):
    name = models.CharField()
    location = models.ForeignKey(Location)

class Author(models.Model):
    publisher = models.ForeignKey(
        Publisher, null=True, related_name="authors"
    )

INNER vs OUTER rule #4

JOINs against OUTER JOIN'ed tables are always OUTER

Publisher.objects.select_related("location")
SELECT *
FROM publisher
-- publisher.location is not nullable thus
-- rule #1 applies and an INNER JOIN is used
INNER JOIN location ON (publisher.location_id = location.id)

INNER vs OUTER rule #4

JOINs against OUTER JOIN'ed tables are always OUTER

Author.objects.select_related("publisher__location")
SELECT *
FROM author
-- author.publisher_id is NULL-able thus
-- an OUTER JOIN must be used.
LEFT OUTER JOIN publisher ON (author.publisher_id = publisher.id)
-- publisher.location is not NULL-able *but*
-- it's JOIN'ing from publisher which is
-- NULL-able thus rule #4 applies.
LEFT OUTER JOIN location ON (publisher.location_id = location.id)

INNER vs OUTER rules

If you are curious about how these rules are implemented you can refer to the following django.db.models.sql.Query methods

  • setup_join; turn a path between models into a JOIN
  • join; reuse setupJOIN if possible and augment query with aliases

JOIN reuse ↔️ ♻️

JOINs can be reused between __ references

  • Same Queryset method call (e.g. single filter)
  • Targeting single-valued relationships

JOIN reuse ↔️ ♻️

Same Queryset method call (e.g. single filter)

class Location(models.Model):
    pass

class Publisher(models.Model):
    name = models.CharField()
    location = models.ForeignKey(Location)

class Author(models.Model):
    name = models.CharField()
    dob = models.DateField()
    publisher = models.ForeignKey(
        Publisher, null=True, related_name="authors"
    )

JOIN reuse ↔️ ♻️

Same Queryset method call (e.g. single filter)

Publisher.objects.filter(
    authors__name__startswith="Simon", authors__dob__year=1987,
)
SELECT *
FROM publisher
INNER JOIN author ON (publisher.id = author.publisher_id)
WHERE (
    name LIKE 'Simon%'
    AND dob BETWEEN '1987-01-01' AND '1987-12-31'
)

JOIN reuse ↔️ ♻️

Distinct Queryset method call (e.g. single filter)

Publisher.objects.filter(
    authors__name__startswith="Simon",
).filter(authors__dob__year=1987)
SELECT *
FROM publisher
INNER JOIN author ON (publisher.id = author.publisher_id)
INNER JOIN author T0 ON (publisher.id = author.publisher_id)
WHERE (
    author.name LIKE 'Simon%'
    AND T1.dob BETWEEN '1987-01-01' AND '1987-12-31'
)

JOIN reuse ↔️ ♻️

Targeting single-valued relationships

Author.objects.filter(publisher__name="foo")
.filter(publisher__location__id=123)
.annotate(location_name=models.F("publisher__location__name"))
SELECT author.*, location.name AS location_name
FROM author
INNER JOIN publisher ON (publisher.id = author.publisher_id)
WHERE (
    publisher.name = 'Foo'
    AND publisher.location_id = 123
)

Summary of topics discussed today

  • What is a JOIN?
  • INNER vs OUTER
  • NULL != <ANYTHING>
  • ORM INNER vs OUTER rules
  • ORM JOIN reuse

Thanks! 🙇‍♂️

  • Happy to answer further questions through Q/A!

Thanks Priya!

Multi-valued relationship in same `filter` call

Multi-valued relationship in distinct `filter` call

Multi-valued relationship in distinct `filter` call