Should I use `select_for_update` in my Django transcription application?
My app allows users to upload and transcribe audio files. My code stores the audio file and transcript in a postgres database.
Here is my original code:
try:
transcribed_doc = TranscribedDocument.objects.get(id=session_id)
except TranscribedDocument.DoesNotExist:
...
try:
if transcribed_doc.state == 'not_started':
transcribed_doc.state = 'in_progress'
transcribed_doc.save()
...
Here is the adjusted code:
try:
transcribed_doc = TranscribedDocument.objects.select_for_update().get(id=session_id)
except TranscribedDocument.DoesNotExist:
...
try:
if transcribed_doc.state == 'not_started':
transcribed_doc.state = 'in_progress'
transcribed_doc.save()
...
I know select_for_update()
is used to lock the row being updated to prevent situations where multiple calls might try to update the same record simultaneously. But I can only imagine such a situation if the same user mistakenly or maliciously start the same transcription process multiple times by, for example, hitting the 'start transcription' button several times.
If you believe there might be concurrent requests updating the data, it's better to always lock rows. Although you may not currently have many examples where rows are updated simultaneously, more situations could arise as your application grows. For example, you might add a background process that checks something and updates a field on the model.
However, if you don't foresee this happening, you can go without locks for now and add them later when the need arises. In end, it all depends on the number of requests/users which are using your application.
Also read about deadlocks
and why they are happening.