Demystifying the Django ORM ✨

How querysets get turned to SQL and back to model instances

by Simon Charette

Simon Charette

  • Montréal, Canada
  • Django contributor for ~10 years
    • Mostly to the ORM & migrations
  • Principal Eng. at Zapier

📧 charettes@gmail.com
🐙 github.com/charettes

What is a ORM?

  • Object–Relational Mapping
  • Definition of model relations mapping to database tables
  • Retrieval and update of data
  • Maps table rows to Python objects

How are tables mapped to models?

class Author(models.Model):
    name = models.CharField()
CREATE TABLE author (
  id integer primary key,
  name varchar
)
id name
... ...

How is data stored in tables?

Author.objects.create(name="Carlton Gibson")
INSERT INTO author (name) VALUES ('Carlton Gibson')
id name
1 Carlton Gibson

How is data retrieved from tables?

Author.objects.filter(name="Carlton Gibson")
SELECT * FROM author WHERE name = 'Carlton Gibson'
[Author(name="Carlton Gibson")]

Turning Python objects into SQL

Resolving phase; overview

Resolving phase; resolve_expression

def resolve_expression(self, query: sql.Query):
  ...
>>> Author.objects.filter(name="Carlton Gibson")
>>> F("name").resolve_expression(author_query)
Col("author", "name")
>>> F("missing").resolve_expression(author_query)
FieldError("Cannot resolve 'missing' into field")

Resolving phase; accumulation

author_qs = Author.objects.filter(name__startswith="Carlton").order_by("id")
author_qs.query.where = [
  StartsWith(
    Col("author", "name"),
    Value("Carlton")
  )
]
author_qs.query.order_by = [
  Col("author", "id")
]

Resolving phase; summary

  • A resolvable is a Python object with a resolve_expression method accepting a sql.Query object and returning a compilable
  • This phase is accumulative; QuerySet remains lazy and more resolving can occur
  • Most of resolvable are subclasses of django.db.models.Expression
  • Most of resolvable are composed of other resolvable in a nested doll fashion
  • This phase is backend agnostic; no per backend logic (SQLite, Postgres, MySQL, Oracle)

Compilation phase; overview

  • Non-lazy QuerySet operation triggers compilation
  • Proper settings.DATABASES entry is determined from routers
  • A specific compiler is picked (update, insert, aggregate) based on the requested operation
  • The compiler recursively compile the query and its constituents

Compilation phase; compilables

A compilable is a Python object that provides a as_sql method (or a vendor specific equivalent) accepting a connection and compiler object and returning its SQL equivalent as a (sql: str, params: tuple).

>>> Col("author", "name").as_sql(connection, compiler)
('"author"."name"', ())
>>> StartsWith(Col("author", "name"), Value("Carlton")).as_sql(connection, compiler)
('"author"."name" LIKE %s', ("%name",))

Compilation phase; resolvable and compilables

  • A resolvable's resolve_expression must return a compilable
  • A resolvable object can also be a compilable (as long as both resolve_expressions and as_sql are implemented)
  • In practice most resolvable objects are compilable as only field references need to be resolved
  • The django.db.models.Expression base class is both a resolvable and a compilable

Compilation phase; resolvable and compilables

Nowadays **kwargs references to fields and annotations in QuerySet methods (filter, exclude, order_by) are shorthand for expression construction

Author.objects.filter(name__startswith="Carlton")

Is equivalent to

from django.db.models.lookups import StartsWith
Author.objects.filter(
  StartsWith(F("name"), Value("Carlton"))
)

Compilation phase; resolvable and compilables examples

Literal values resolve to themselves

>>> Value(42).resolve_expression(query)
Value(42)
>>> Value(42).as_sql(compiler, connection)
("%s", (42,))

Func instances resolve their constituents and wrap the SQL

>>> Func(F("name"), function="LOWER").resolve_expression(query)
Func(Col("author", "name"), function="LOWER")
>>> Func(Col("author", "name"), function="LOWER").as_sql(compiler, connection)
('LOWER("author"."name")', ())

Compilation phase; query compilation

Operation compilers stich query constituents together depending on the desired operation

In the case of a data retrieval the default SQLCompiler is used to produce a SELECT query.

  • QuerySet.create uses SQLInsertCompiler and produces a SQL INSERT query.
  • QuerySet.update uses SQLUpdateCompiler and produces a SQL UPDATE query.
  • QuerySet.delete uses SQLDeleteCompiler and produces a SQL DELETE query.

Compilation phase; query compilation examples

Author.objects.only("id")

Data retrieval uses SQLCompiler

>>> fields_sql, fields_params = [], ()
>>> for field in query.fields:
>>>     field_sql, field_params = self.compile(field)
>>>     fields_sql.append(field_sql)
>>>     fields_params += field_params
>>> select_clause = ", ".join(fields_sql)
>>> query_sql = f"SELECT {select_clause} FROM {query.from_table}"

Execution phase; SQL to tuples

  • (sql: str, params: tuple) is transmitted to the configured SQL backend through the installed driver (e.g. psycopg, PEP-249)
  • Either a tabular result is returned in a list[tuple] form or an exception is raised

Execution phase; examples

>>> with connection.cursor() as cursor:
>>>     result = cursor.execute(
>>>          'SELECT "id" FROM author', (),
>>>     )
>>> result
[(1,)]
>>> with connection.cursor() as cursor:
>>>     result = cursor.execute(
>>>          'INSERT INTO "author" ("id") VALUES(%s)', ("invalid",),
>>>     )
IntegrityError("...")

Construction phase; tuples to desired type

  • Tuples returned and their members are coerced to their expected types

Construction phase; tuples members coercion

  • Returned values are turned into their expected value through Field.from_db_value coercion
class JSONField(Field):
    def from_db_value(self, value):
        return json.loads(value)

class BooleanField(Field):
    def from_db_value(self, value):
        return bool(value)

Construction phase; iterable type

  • values turns returned tuples into Iterable[dict]
  • values_list keeps the tuples unchanged Iterable[tuple]
  • If neither are used each tuple is turned into instances of the queried Model

Construction phase; model construction simplified

>>> rows = [(1,)]
>>> authors = []
>>> for row in rows:
>>>     author = Author.from_db(db, ("id",), row)
>>>     authors.append(author)

Phases; summary

Resolving
Lazy, accumulating, validation of input, backend agnostic

Compilation
Recursive compilation of resolved query, operation specific compilers, produces SQL and params, backend specific

Execution
Transmission of compiled query, delegation through drivers

Construction
Coercion of tuples members, iterable type

Thanks! 🙇‍♂️

  • Happy to answer further questions through Q/A!

Thanks Tim!