Django's select_for_update(skip_locked=True) not compatible with spanning

I was attempting to lock the oldest item in a queryset and bashed my head over why it was not working.

Every time I used this first code snippet the entire query set would be locked.

with transaction.atomic():
   locked_entry = Entry.objects.select_for_update(skip_locked=True).filter(
                event__date=distribution_date(), 
                status='pending',
                entry_type__name='Premium'
            ).order_by('-created_at').first()
    print(locked_entry)
    sleep(4)

However, upon passing the exact FK instances the lock began to correctly only lock the oldest instance. I'm sure it has something to do with how the SQL is called but was hoping for an explanation if there are any Django experts out there :)

with transaction.atomic():
   locked_entry = Entry.objects.select_for_update(skip_locked=True).filter(
                event=Event.objects.get(date=distribution_date()), 
                status='pending',
                entry_type=EntryType.objects.get(name='Premium')
            ).order_by('-created_at').first()
    print(locked_entry)
    sleep(4)

This is to some extent less about Django ORM and a little bit more about native Postgres behavior.

select_for_update() works on querysets only, not on single-instance results. For this reason, the only way the second version of your code could lock nothing more than a single row, is if the queryset itself (i.e. the queryset as it looks before calling .first()) contained only that one row. This means that skip_locked doesn't much do anything in your particular use-case as shown so far, as locking the entire queryset and then asking for the entire queryset will lead to blocking behavior.

You can confirm this by checking the following modification to your query:

Entry.objects.filter(
        event__date      = default_date, 
        status           = 'pending',
        entry_type__name = 'Premium'
    ).order_by('-created_at')[:1].select_for_update()

The key to the modification is that we move away from .first(), as that is Django sugar for getting an instance rather than a queryset. Slicing, on the other hand, preserves the queryset. I don't think it matters, but for clarity's sake I've also opted for applying .selected_for_update() after we've generated the sliced queryset.

The result is that the modified query locks a queryset with only a single row in it, as opposed to your original query where the lock lands on the full result of the .filter().

I verified this for you, as well.

Test setup

# models.py

from django.db import models
from django.utils import timezone

default_date = timezone.datetime(2024, 12, 1).date()


class EntryType(models.Model):
    name = models.CharField(max_length=100, default="Premium")

class Event(models.Model):
    date = models.DateField(default=default_date)

class Entry(models.Model):
    status     = models.CharField(max_length=100, default="pending")
    created_at = models.DateTimeField(auto_now_add=True)
    entry_type = models.ForeignKey(
        EntryType,
        on_delete=models.CASCADE,
        null=True
    )
    event      = models.ForeignKey(
        Event, 
        on_delete=models.CASCADE,
        null=True
    )

In the test scripts below, you'll see that I've broken up your query - this is to better illustrate when it comes time to seeing the results where the blocking behavior arose. They should be functionally equivalent.

# script.py

from time import sleep
from django.conf  import settings
from django.utils import timezone
from django.db    import models, transaction
from .models      import default_date, Entry, Event, EntryType


def display_db_info():
    db_name = 'default'
    db = settings.DATABASES[db_name]
    print(f"Database: {db_name}: {db['ENGINE']}")
    print('-----------')

def display_sleep_intervals(me, qset):
    print(f"Self sleep: {me}")
    print(f"Query sleep: {qset}")

def add_db_items():
    entry_type = EntryType.objects.create()
    event      = Event.objects.create()

    for _ in range(1, 3):
        Entry.objects.create(entry_type=entry_type, event=event)
        print("Created 1 set of objects")
        sleep(1)


def queryset_runner(qset: models.QuerySet, sleep_interval):
    with transaction.atomic():
        print(f"  New transaction, using sleep interval value: {sleep_interval}")
        locked_entry = qset[:1]

        print(f"{timezone.now().time()}: Full queryset: {qset}")
        #print(f"{timezone.now().time()}: Queryset SQL: {qset.query}")
        print(f"{timezone.now().time()}: Narrowed result: {locked_entry}")
        sleep(sleep_interval)


def test_with_chaining(sleep_interval = 4):
    qset: models.QuerySet = Entry.objects.select_for_update().filter(
                    event__date      = default_date, 
                    status           = 'pending',
                    entry_type__name = 'Premium'
                ).order_by('-created_at')
    queryset_runner(qset, sleep_interval)


def test_runner_chaining(self_sleep_interval = 5, query_sleep_interval = 4):
    display_sleep_intervals(self_sleep_interval, query_sleep_interval)
    display_db_info()

    for i in range(1, 3):
        print(f"\n{timezone.now().time()}: Starting run ({i})")
        test_with_chaining(query_sleep_interval)
        print(f"{timezone.now().time()}: Completed run ({i}), sleeping for ({self_sleep_interval})")
        sleep(self_sleep_interval)


def test_with_explicit_FKs(sleep_interval = 4):
    qset: models.QuerySet = Entry.objects.select_for_update().filter(
                    event=Event.objects.get(date=default_date), 
                    status='pending',
                    entry_type=EntryType.objects.get(name='Premium')
                ).order_by('-created_at').all()
    queryset_runner(qset, sleep_interval)


def test_runner_explicit(self_sleep_interval = 5, query_sleep_interval = 4):
    display_sleep_intervals(self_sleep_interval, query_sleep_interval)
    display_db_info()

    for i in range(1, 3):
        print(f"\n{timezone.now().time()}: Starting run: {i}")
        test_with_explicit_FKs(query_sleep_interval)
        print(f"{timezone.now().time()}: Completed run: {i}, sleeping for ({self_sleep_interval})")
        sleep(self_sleep_interval)


def retrieve_nonblocked_queryset():
    with transaction.atomic():
        qset = Entry.objects.all()
        print(f"{timezone.now().time()}: Requesting row, awaiting DB")
        print(f"  {qset[:1].select_for_update()}")
        print(f"{timezone.now().time()}: DB returned")

"Base" results (no skip_locked)

# tty1:

In [1]: from apps.test.script import test_runner_chaining, retrieve_nonblocked_queryset

In [2]: test_runner_chaining(1, 5)
Self sleep: 1
Query sleep: 5
Database: default: django_tenants.postgresql_backend
-----------

04:33:51.076197: Starting run (1)
  New transaction, using sleep interval value: 5
04:33:51.109168: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
04:33:51.116166: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
04:33:56.118673: Completed run (1), sleeping for (1)

04:33:57.119963: Starting run (2)
  New transaction, using sleep interval value: 5
04:33:57.120957: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
04:34:01.122894: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
04:34:06.125476: Completed run (2), sleeping for (1)
# tty2:

In [1]: from apps.test.script import test_runner_chaining, retrieve_nonblocked_queryset

In [2]: test_runner_chaining(1, 5)
Self sleep: 1
Query sleep: 5
Database: default: django_tenants.postgresql_backend
-----------

04:33:52.410499: Starting run (1)
  New transaction, using sleep interval value: 5
04:33:52.443820: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
04:33:56.118673: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
04:34:01.122894: Completed run (1), sleeping for (1)

04:34:02.123438: Starting run (2)
  New transaction, using sleep interval value: 5
04:34:02.124424: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
04:34:06.126336: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
04:34:11.128978: Completed run (2), sleeping for (1)

Looking at the timestamps, you can see that tty1 starts its run and then tty2 starts approx. 1 second later. The function will take a minimum of 5 seconds to execute, leaving plenty of time for the timestamps to show-case the blocking behavior.

Compare total runtime of run #1 and you'll immediately see that this is the case. During run #1, tty1 spends ~5 seconds whereas tty2 spends ~9 seconds.

Specifically, these two lines are of interest (check the timestamps):

# tty1
04:33:56.118673: Completed run (1), sleeping for (1)
# tty2
04:33:56.118673: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>

In tty2, when Django executes the slice (which happens during the print), it's forced to wait - because tty1's transaction isn't completed, so it still holds the lock. That's why the delay - Postgres holds the connection but delays giving an answer, waiting for the lock to free up first. Which is what the timestamps above show: tty1's transaction finishes, and immediately after, tty2 receives an answer from the now-unlocked table.

Looking at the full terminal output, you'll also see that tty1 spends some time waiting for a result in its 2nd run - and that the timestamp coincides with the timestamp of when tty2's first transaction finishes. The reason being exactly the same as above, just that the roles are now reversed.

Results with skip_locked

# tty1

04:50:46.995004: Starting run (1)
  New transaction, using sleep interval value: 5
04:50:47.028377: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
04:50:47.035377: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
04:50:52.038783: Completed run (1), sleeping for (1)
# tty2

04:50:48.223853: Starting run (1)
  New transaction, using sleep interval value: 5
04:50:48.257852: Full queryset: <QuerySet []>
04:50:48.263286: Narrowed result: <QuerySet []>
04:50:53.266371: Completed run (1), sleeping for (1)

This time there's no waiting around, and that's to be expected since skip_locked is in effect.

You'll notice that in this situation we can't determine what even the parent queryset is, and that is precisely because tty1 has locked the entire queryset, and since we're skipping things that are locked the result list returns from Postgres to Django as an empty set.

How are we sure that putting .select_for_update() on the "wrong" queryset is the cause of this?

I did find some results using pg tools, but I frankly find them a little hard to work with. I instead devised a more practical test in the retrieve_nonblocked_queryset() function, where we won't be asking for the same queryset, we'll be asking for a specific row that we know based on the results from our earlier runs won't be row-locked. That in turn means that if we experience a result delay in the nonblocking function, we can infer that this must mean the test-runner query ends up passing in either a table lock or at least row locks that span the entire queryset, not just the object we end up selecting.

The setup:

Execute a test runner on tty1, and the nonblocked function on tty2.

Results:

# tty1

05:00:47.718589: Starting run (1)
  New transaction, using sleep interval value: 5
05:00:47.719588: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
05:00:47.720587: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
05:00:52.722550: Completed run (1), sleeping for (1)
# tty2

In [3]: retrieve_nonblocked_queryset()
05:00:48.727061: Requesting row, awaiting DB
  <QuerySet [<Entry: Entry object (16)>]>
05:00:52.723543: DB returned

Again we see timestamps coincide, indicating that tty2's transaction had to wait for tty1's transaction.

Had we used skip_locked in the nonblocking function, we'd have gotten an empty queryset same as before.

Modified query

Code changes (moving .select_for_update() from its original place to after a slice):

def queryset_runner(qset: models.QuerySet, sleep_interval):
    with transaction.atomic():
        locked_entry = qset[:1].select_for_update()    # <--- Moved here

        print(f"{timezone.now().time()}: Full queryset: {qset}")
        print(f"{timezone.now().time()}: Narrowed result: {locked_entry}")
        sleep(sleep_interval)


def test_with_chaining(sleep_interval = 4):
    qset: models.QuerySet = Entry.objects.filter(
                    event__date      = default_date, 
                    status           = 'pending',
                    entry_type__name = 'Premium'
                ).order_by('-created_at')
    queryset_runner(qset, sleep_interval)

Results:

# tty1

In [2]: test_runner_chaining(1, 5)
Self sleep: 1
Query sleep: 5
Database: default: django_tenants.postgresql_backend
-----------

05:10:57.943099: Starting run (1)
  New transaction, using sleep interval value: 5
05:10:57.977099: Full queryset: <QuerySet [<Entry: Entry object (17)>, <Entry: Entry object (16)>]>
05:10:57.984099: Narrowed result: <QuerySet [<Entry: Entry object (17)>]>
05:11:02.986970: Completed run (1), sleeping for (1)
# tty2

In [2]: retrieve_nonblocked_queryset()
05:10:58.621391: Requesting row, awaiting DB
  <QuerySet [<Entry: Entry object (16)>]>
05:10:58.626064: DB returned

tty2's transaction returns immediately, without waiting for tty1's transaction. That means whatever lock tty1 acquired didn't affect tty2's query. This strongly indicates that with the modified query, tty1 only locked Entry(17) (instead of the entire filtered queryset), leaving Entry(16) unlocked for tty2.

Summary

Not included are the runs to check if there are material differences with running a query with explicit FKs instead of traversing relationships. The code is there if you want to do it (test_runner_explicit()), but I ran them several times and found the behavior identical to the other test runner.

I don't know what is causing the behavior you describe where explicit FKs mattered, it's either some important component or code that you didn't include in your question, or an old version of Django, or an old or "non-standard" configuration of Postgres.

As far as SQL goes, you can uncomment the relevant line in the queryset runner if you want to compare SQL statements - but on Django 5.0 they were slightly different, though only in terms of how it was asking for certain fields in the JOIN (which is to say that it won't lead to a difference in blocking behavior). So I am fairly certain the translated SQL is irrelevant.

In the end, you can solve the problem in a much cleaner way by constructing a significantly more narrowed-down queryset by using a slice, and then applying the lock, re: the modified query. This works regardless of whether you build your query with explicit FKs or by traversing relationships.

Back to Top