Как более эффективно проверять наличие аномалий данных в транзакции (Django)
Я хочу проверить, происходят ли в транзакции аномалии данных, такие как грязное чтение, неповторяющееся чтение, фантомное чтение, потерянное обновление и т.д.
На самом деле, я использовал person
таблицу , которая имеет id
и name
, как показано ниже.
person
таблица:
id | name |
---|---|
1 | John |
2 | David |
Затем я протестировал non-repeatable read с test
view ниже и one command prompt. *В течение sleep(10)
, один командный интерпретатор обновляет "David" до "Tom" и фиксирует:
# "store/views.py"
from .models import Person
from django.http import HttpResponse
from django.db import transaction
from time import sleep
@transaction.atomic
def test(request):
print(Person.objects.get(pk=2)) # "David"
sleep(10) # Update "David" to "Tom" and commit by one command prompt.
print(Person.objects.get(pk=2)) # "Tom"
return HttpResponse("Test")
Но каждый раз, когда я тестирую аномалии данных, мне нужно вручную запускать test
view, обновлять и фиксировать с помощью одной командной строки, что занимает много времени.
Итак, как я могу более эффективно проверить, происходят ли аномалии данных в транзакции?
Используя потоки, вы можете более эффективно проверить, возникают ли аномалии данных в транзакции.
Я создал 5 наборов кода с потоками для тестирования 5 распространенных аномалий данных dirty read, non-repeatable read, phantom read, lost update и write skew с уровнем изоляции Django по умолчанию READ COMMITTED и PostgreSQL как показано ниже. *Потерянное обновление или перекос записи происходит из-за состояния гонки, и я использовал PostgreSQL.
Я объясняю по поводу:
- грязное чтиво => Здесь
- неповторяемое чтение и фантомное чтение => Здесь
- потерянное обновление и перекос записи => Здесь
- какая аномалия возникает в каком уровне изоляции => Здесь
<Грязное чтение>, <Неповторяемое чтение> и <Фантомное чтение>
Сначала я создал store_person
таблицу с id
и name
с models.py
как показано ниже:
store_person
таблица:
id | name |
---|---|
1 | John |
2 | David |
# "store/models.py"
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=30)
Затем я создал и запустил тестовый код dirty read, как показано ниже:
# "store/views.py"
from django.db import transaction
from time import sleep
from .models import Person
from threading import Thread
from django.http import HttpResponse
@transaction.atomic
def transaction1(flow):
while True:
while True:
if flow[0] == "Step 1":
sleep(0.1)
print("<T1", flow[0] + ">", "BEGIN")
flow[0] = "Step 2"
break
while True:
if flow[0] == "Step 2":
sleep(0.1)
print("<T1", flow[0] + ">", "SELECT")
person = Person.objects.get(id=2)
print(person.id, person.name)
flow[0] = "Step 3"
break
while True:
if flow[0] == "Step 5":
sleep(0.1)
print("<T1", flow[0] + ">", "SELECT")
person = Person.objects.get(id=2)
print(person.id, person.name)
flow[0] = "Step 6"
break
while True:
if flow[0] == "Step 6":
sleep(0.1)
print("<T1", flow[0] + ">", "COMMIT")
flow[0] = "Step 7"
break
break
@transaction.atomic
def transaction2(flow):
while True:
while True:
if flow[0] == "Step 3":
sleep(0.1)
print("<T2", flow[0] + ">", "BEGIN")
flow[0] = "Step 4"
break
while True:
if flow[0] == "Step 4":
sleep(0.1)
print("<T2", flow[0] + ">", "UPDATE")
Person.objects.filter(id=2).update(name="Tom")
flow[0] = "Step 5"
break
while True:
if flow[0] == "Step 7":
sleep(0.1)
print("<T2", flow[0] + ">", "COMMIT")
break
break
def call_transcations(request):
flow = ["Step 1"]
thread1 = Thread(target=transaction1, args=(flow,), daemon=True)
thread2 = Thread(target=transaction2, args=(flow,), daemon=True)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
return HttpResponse("Call_transcations")
Тогда, dirty read не произошло, согласно результату, выведенному на консоль, потому что на любых уровнях изоляции в PostgreSQL, dirty read не происходит:
<T1 Step 1> BEGIN
<T1 Step 2> SELECT
2 David # Here
<T2 Step 3> BEGIN
<T2 Step 4> UPDATE
<T1 Step 5> SELECT
2 David # Here
<T1 Step 6> COMMIT
<T2 Step 7> COMMIT
А также, я смог получить журнал запросов PostgreSQL ниже. Вы можете проверить как вести журнал запросов PostgreSQL:
[23576]: BEGIN
[23576]: SELECT "store_person"."id", "store_person"."name"
FROM "store_person"
WHERE "store_person"."id" = 2
LIMIT 21
[8600]: BEGIN
[8600]: UPDATE "store_person" SET "name" = 'Tom'
WHERE "store_person"."id" = 2
[23576]: SELECT "store_person"."id", "store_person"."name"
FROM "store_person"
WHERE "store_person"."id" = 2
LIMIT 21
[23576]: COMMIT
[8600]: COMMIT
В этой таблице ниже показан поток и журналы SQL запросов PostgreSQL выше:
Flow | Transaction 1 (T1) | Transaction 2 (T2) | Explanation |
---|---|---|---|
Step 1 | BEGIN; |
T1 starts. | |
Step 2 | SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21; 2 David |
T1 reads David . |
|
Step 3 | BEGIN; |
T2 starts. | |
Step 4 | UPDATE "store_person" SET "name" = 'Tom' WHERE "store_person"."id" = 2; |
T2 updates David to Tom . |
|
Step 5 | SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21; 2 David |
T1 reads David instead of Tom before T2 commits.*Dirty read doesn't occur!! |
|
Step 6 | COMMIT; |
T1 commits. | |
Step 7 | COMMIT; |
T2 commits. |
Далее, я создал и запустил тестовый код non-repeatable read как показано ниже:
# "store/views.py"
# ...
@transaction.atomic
def transaction1(flow):
while True:
while True:
if flow[0] == "Step 1":
sleep(0.1)
print("<T1", flow[0] + ">", "BEGIN")
flow[0] = "Step 2"
break
while True:
if flow[0] == "Step 2":
sleep(0.1)
print("<T1", flow[0] + ">", "SELECT")
person = Person.objects.get(id=2)
print(person.id, person.name)
flow[0] = "Step 3"
break
while True:
if flow[0] == "Step 6":
sleep(0.1)
print("<T1", flow[0] + ">", "SELECT")
person = Person.objects.get(id=2)
print(person.id, person.name)
flow[0] = "Step 7"
break
while True:
if flow[0] == "Step 7":
sleep(0.1)
print("<T1", flow[0] + ">", "COMMIT")
break
break
@transaction.atomic
def transaction2(flow):
while True:
while True:
if flow[0] == "Step 3":
sleep(0.1)
print("<T2", flow[0] + ">", "BEGIN")
flow[0] = "Step 4"
break
while True:
if flow[0] == "Step 4":
sleep(0.1)
print("<T2", flow[0] + ">", "UPDATE")
Person.objects.filter(id=2).update(name="Tom")
flow[0] = "Step 5"
break
while True:
if flow[0] == "Step 5":
sleep(0.1)
print("<T2", flow[0] + ">", "COMMIT")
flow[0] = "Step 6"
break
break
# ...
Тогда, неповторяемое чтение произошло согласно результату, выведенному на консоль, потому что на уровне изоляции READ COMMITTED в PostgreSQL, неповторяемое чтение происходит:
<T1 Step 1> BEGIN
<T1 Step 2> SELECT
2 David # Here
<T2 Step 3> BEGIN
<T2 Step 4> UPDATE
<T2 Step 5> COMMIT
<T1 Step 6> SELECT
2 Tom # Here
<T1 Step 7> COMMIT
А также, я могу получить журналы запросов PostgreSQL ниже:
[23128]: BEGIN
[23128]: SELECT "store_person"."id", "store_person"."name"
FROM "store_person"
WHERE "store_person"."id" = 2
LIMIT 21
[6368]: BEGIN
[6368]: UPDATE "store_person" SET "name" = 'Tom'
WHERE "store_person"."id" = 2
[6368]: COMMIT
[23128]: SELECT "store_person"."id", "store_person"."name"
FROM "store_person"
WHERE "store_person"."id" = 2
LIMIT 21
[23128]: COMMIT
В этой таблице ниже показан поток и журналы SQL запросов PostgreSQL выше:
Flow | Transaction 1 (T1) | Transaction 2 (T2) | Explanation |
---|---|---|---|
Step 1 | BEGIN; |
T1 starts. | |
Step 2 | SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21; 2 David |
T1 reads David . |
|
Step 3 | BEGIN; |
T2 starts. | |
Step 4 | UPDATE "store_person" SET "name" = 'Tom' WHERE "store_person"."id" = 2; |
T2 updates David to Tom . |
|
Step 5 | COMMIT; |
T2 commits. | |
Step 6 | SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21; 2 Tom |
T1 reads Tom instead of David after T2 commits.*Non-repeatable read occurs!! |
|
Step 7 | COMMIT; |
T1 commits. |
Далее, я создал и запустил тестовый код phantom read, как показано ниже:
# "store/views.py"
# ...
@transaction.atomic
def transaction1(flow):
while True:
while True:
if flow[0] == "Step 1":
sleep(0.1)
print("<T1", flow[0] + ">", "BEGIN")
flow[0] = "Step 2"
break
while True:
if flow[0] == "Step 2":
sleep(0.1)
print("<T1", flow[0] + ">", "SELECT")
persons = Person.objects.all()
for person in persons:
print(person.id, person.name)
flow[0] = "Step 3"
break
while True:
if flow[0] == "Step 6":
sleep(0.1)
print("<T1", flow[0] + ">", "SELECT")
persons = Person.objects.all()
for person in persons:
print(person.id, person.name)
flow[0] = "Step 7"
break
while True:
if flow[0] == "Step 7":
sleep(0.1)
print("<T1", flow[0] + ">", "COMMIT")
break
break
@transaction.atomic
def transaction2(flow):
while True:
while True:
if flow[0] == "Step 3":
sleep(0.1)
print("<T2", flow[0] + ">", "BEGIN")
flow[0] = "Step 4"
break
while True:
if flow[0] == "Step 4":
sleep(0.1)
print("<T2", flow[0] + ">", "INSERT")
Person.objects.create(id=3, name="Tom")
flow[0] = "Step 5"
break
while True:
if flow[0] == "Step 5":
sleep(0.1)
print("<T2", flow[0] + ">", "COMMIT")
flow[0] = "Step 6"
break
break
# ...
Затем, phantom read произошел согласно результату, выведенному на консоль, потому что в READ COMMITTED уровень изоляции в PostgreSQL, phantom read происходит:
<T1 Step 1> BEGIN
<T1 Step 2> SELECT
1 John # Here
2 David # Here
<T2 Step 3> BEGIN
<T2 Step 4> INSERT
<T2 Step 5> COMMIT
<T1 Step 6> SELECT
1 John # Here
2 David # Here
3 Tom # Here
<T1 Step 7> COMMIT
А также, я могу получить журналы запросов PostgreSQL ниже:
[15912]: BEGIN
[15912]: SELECT "store_person"."id", "store_person"."name"
FROM "store_person"
[2098]: BEGIN
[2098]: INSERT INTO "store_person" ("id", "name")
VALUES (3, 'Tom')
RETURNING "store_person"."id"
[2098]: COMMIT
[15912]: SELECT "store_person"."id", "store_person"."name"
FROM "store_person"
[15912]: COMMIT
В этой таблице ниже показан поток и журналы SQL запросов PostgreSQL выше:
Flow | Transaction 1 (T1) | Transaction 2 (T2) | Explanation |
---|---|---|---|
Step 1 | BEGIN; |
T1 starts. | |
Step 2 | SELECT "store_person"."id", "store_person"."name" FROM "store_person"; 1 John 2 David |
T1 reads 2 rows. | |
Step 3 | BEGIN; |
T2 starts. | |
Step 4 | INSERT INTO "store_person" ("id", "name") VALUES (3, 'Tom') RETURNING "store_person"."id"; |
T2 inserts the row with 3 and Tom to person table. |
|
Step 5 | COMMIT; |
T2 commits. | |
Step 6 | SELECT "store_person"."id", "store_person"."name" FROM "store_person"; 1 John 2 David 3 Tom |
T1 reads 3 rows instead of 2 rows after T2 commits.*Phantom read occurs!! |
|
Step 7 | COMMIT; |
T1 commits. |
<Потерянное обновление>
Сначала я создал store_product
таблицу с id
, name
и stock
с models.py
, как показано ниже:
store_product
таблица:
id | name | stock |
---|---|---|
1 | Apple | 10 |
2 | Orange | 20 |
# "store/models.py"
from django.db import models
class Product(models.Model):
name = models.CharField(max_length=30)
stock = models.IntegerField()
# "store/views.py"
# ...
@transaction.atomic
def transaction1(flow):
while True:
while True:
if flow[0] == "Step 1":
sleep(0.1)
print("T1", flow[0], "BEGIN")
flow[0] = "Step 2"
break
while True:
if flow[0] == "Step 2":
sleep(0.1)
print("T1", flow[0], "SELECT")
product = Product.objects.get(id=2)
print(product.id, product.name, product.stock)
flow[0] = "Step 3"
break
while True:
if flow[0] == "Step 5":
sleep(0.1)
print("T1", flow[0], "UPDATE")
Product.objects.filter(id=2).update(stock=13)
flow[0] = "Step 6"
break
while True:
if flow[0] == "Step 6":
sleep(0.1)
print("T1", flow[0], "COMMIT")
flow[0] = "Step 7"
break
break
@transaction.atomic
def transaction2(flow):
while True:
while True:
if flow[0] == "Step 3":
sleep(0.1)
print("T2", flow[0], "BEGIN")
flow[0] = "Step 4"
break
while True:
if flow[0] == "Step 4":
sleep(0.1)
print("T2", flow[0], "SELECT")
product = Product.objects.get(id=2)
print(product.id, product.name, product.stock)
flow[0] = "Step 5"
break
while True:
if flow[0] == "Step 7":
sleep(0.1)
print("T2", flow[0], "UPDATE")
Product.objects.filter(id=2).update(stock=16)
flow[0] = "Step 8"
break
while True:
if flow[0] == "Step 8":
sleep(0.1)
print("T2", flow[0], "COMMIT")
break
break
# ...
Затем, произошло ошибочное обновление, согласно результату, показанному на консоли, потому что на уровне изоляции READ COMMITTED в PostgreSQL, произошло ошибочное обновление:
T1 Step 1 BEGIN
T1 Step 2 SELECT # Reads the same row
2 Orange 20
T2 Step 3 BEGIN
T2 Step 4 SELECT # Reads the same row
2 Orange 20
T1 Step 5 UPDATE # Writes "stock"
T1 Step 6 COMMIT # And commits the write
T2 Step 7 UPDATE # Overwrites "stock"
T2 Step 8 COMMIT # And commits the overwrite
А также, я могу получить журналы запросов PostgreSQL ниже:
[20504]: BEGIN
[20504]: SELECT "store_product"."id", "store_product"."name", "store_product"."stock"
FROM "store_product"
WHERE "store_product"."id" = 2
LIMIT 21
[3840]: BEGIN
[3840]: SELECT "store_product"."id", "store_product"."name", "store_product"."stock"
FROM "store_product"
WHERE "store_product"."id" = 2
LIMIT 21
[20504]: UPDATE "store_product" SET "stock" = 13
WHERE "store_product"."id" = 2
[20504]: COMMIT
[3840]: UPDATE "store_product" SET "stock" = 16
WHERE "store_product"."id" = 2
[3840]: COMMIT
В этой таблице ниже показан поток и журналы SQL запросов PostgreSQL выше:
Flow | Transaction 1 (T1) | Transaction 2 (T2) | Explanation |
---|---|---|---|
Step 1 | BEGIN; |
T1 starts. | |
Step 2 | SELECT "store_product"."id", "store_product"."name", "store_product"."stock" FROM "store_product" WHERE "store_product"."id" = 2 LIMIT 21; 2 Orange 20 |
T1 reads 20 which is updated later to 13 because a customer buys 7 oranges. |
|
Step 3 | BEGIN; |
T2 starts. | |
Step 4 | SELECT "store_product"."id", "store_product"."name", "store_product"."stock" FROM "store_product" WHERE "store_product"."id" = 2 LIMIT 21; 2 Orange 20 |
T2 reads 20 which is updated later to 16 because a customer buys 4 oranges. |
|
Step 5 | UPDATE "store_product" SET "stock" = 13 WHERE "store_product"."id" = 2; |
T1 updates 20 to 13 . |
|
Step 6 | COMMIT; |
T1 commits. | |
Step 7 | UPDATE "store_product" SET "stock" = 16 WHERE "store_product"."id" = 2; |
T2 updates 13 to 16 after T1 commits. |
|
Step 8 | COMMIT; |
T2 commits.*Lost update occurs. |
<Перекос записи>
Сначала я создал store_doctor
таблицу с id
, name
и on_call
с models.py
, как показано ниже:
store_doctor
таблица:
id | name | on_call |
---|---|---|
1 | John | True |
2 | Lisa | True |
# "store/models.py"
from django.db import models
class Doctor(models.Model):
name = models.CharField(max_length=30)
on_call = models.BooleanField()
# "store/views.py"
# ...
@transaction.atomic
def transaction1(flow):
while True:
while True:
if flow[0] == "Step 1":
print("T1", flow[0], "BEGIN")
flow[0] = "Step 2"
break
while True:
if flow[0] == "Step 2":
print("T1", flow[0], "SELECT")
doctor_count = Doctor.objects.filter(on_call=True).count()
print(doctor_count)
flow[0] = "Step 3"
break
while True:
if flow[0] == "Step 5":
print("T1", flow[0], "UPDATE")
Doctor.objects.filter(id=1).update(on_call=False)
flow[0] = "Step 6"
break
while True:
if flow[0] == "Step 6":
print("T1", flow[0], "COMMIT")
flow[0] = "Step 7"
break
break
@transaction.atomic
def transaction2(flow):
while True:
while True:
if flow[0] == "Step 3":
print("T2", flow[0], "BEGIN")
flow[0] = "Step 4"
break
while True:
if flow[0] == "Step 4":
print("T2", flow[0], "SELECT")
doctor_count = Doctor.objects.filter(on_call=True).count()
print(doctor_count)
flow[0] = "Step 5"
break
while True:
if flow[0] == "Step 7":
print("T2", flow[0], "UPDATE")
Doctor.objects.filter(id=2).update(on_call=False)
flow[0] = "Step 8"
break
while True:
if flow[0] == "Step 8":
print("T2", flow[0], "COMMIT")
break
break
# ...
Затем, write skew произошел согласно результату belew на консоли, потому что в READ COMMITTED уровень изоляции в PostgreSQL, write skew происходит:
T1 Step 1 BEGIN
T1 Step 2 SELECT # Reads the same data
2
T2 Step 3 BEGIN
T2 Step 4 SELECT # Reads the same data
2
T1 Step 5 UPDATE # Writes 'False' to John's "on_call"
T1 Step 6 COMMIT # And commits the write
T2 Step 7 UPDATE # Writes 'False' to Lisa's "on_call"
T2 Step 8 COMMIT # And commits the write
А также, я могу получить журналы запросов PostgreSQL ниже:
[11252]: BEGIN
[11252]: SELECT COUNT(*) AS "__count"
FROM "store_doctor"
WHERE "store_doctor"."on_call"
[2368]: BEGIN
[2368]: SELECT COUNT(*) AS "__count"
FROM "store_doctor"
WHERE "store_doctor"."on_call"
[11252]: UPDATE "store_doctor"
SET "on_call" = false
WHERE "store_doctor"."id" = 1
[11252]: COMMIT
[2368]: UPDATE "store_doctor"
SET "on_call" = false
WHERE "store_doctor"."id" = 2
[2368]: COMMIT
В этой таблице ниже показан поток и журналы SQL запросов PostgreSQL выше:
Flow | Transaction 1 (T1) | Transaction 2 (T2) | Explanation |
---|---|---|---|
Step 1 | BEGIN; |
T1 starts. | |
Step 2 | SELECT COUNT(*) AS "__count" FROM "store_doctor" WHERE "store_doctor"."on_call"; 2 |
T1 reads 2 so John can take a rest. |
|
Step 3 | BEGIN; |
T2 starts. | |
Step 4 | SELECT COUNT(*) AS "__count" FROM "store_doctor" WHERE "store_doctor"."on_call"; 2 |
T2 reads 2 so Lisa can take a rest. |
|
Step 5 | UPDATE "store_doctor" SET "on_call" = false WHERE "store_doctor"."id" = 1; |
T1 updates True to False which means John takes a rest. |
|
Step 6 | COMMIT; |
T1 commits. | |
Step 7 | UPDATE "store_doctor" SET "on_call" = false WHERE "store_doctor"."id" = 2; |
T2 updates True to False which means Lisa takes a rest. |
|
Step 8 | COMMIT; |
T2 commits.John and Lisa both take a rest.*Write skew occurs. |