Django: django.core.exceptions.SynchronousOnlyOperation while running scrapy script in django
I am trying to implement scrapy in django. For that this topic helped me.
In my script I just return a simple object in order to see if everything work in order to be added in my model. I don't scrap any website.
- Issue
myspider.py
:
from scrapers.items import ScrapersItem
class ErascraperSpider(scrapy.Spider):
name = "erascraper"
allowed_domains = ["example.com"]
start_urls = ["https://example.com"]
def parse(self, response):
return ScrapersItem(name="Argus")
mypipeline.py
:
class ScrapersPipeline(object):
def process_item(self, item, spider):
item.save()
print("pipeline ok")
return item
Also, I use Scrapy_DjangoItem
in my items.py
.
Howerver I get this error:
2024-05-21 22:01:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://example.com> (referer: None)
2024-05-21 22:01:37 [scrapy.core.scraper] ERROR: Error processing {'name': 'Argus'}
Traceback (most recent call last):
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1078, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy/utils/defer.py", line 340, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/scrapers/scrapers/pipelines.py", line 14, in process_item
item.save()
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy_djangoitem/__init__.py", line 35, in save
self.instance.save()
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 822, in save
self.save_base(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 909, in save_base
updated = self._save_table(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1071, in _save_table
results = self._do_insert(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1112, in _do_insert
return manager._insert(
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/query.py", line 1847, in _insert
return query.get_compiler(using=using).execute_sql(returning_fields)
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1821, in execute_sql
with self.connection.cursor() as cursor:
File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/utils/asyncio.py", line 24, in inner
raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.
- Solution
I read a lot of things about solution especially using sync_to_sync and await but I don't see where I could use it in myscript
Now, in order to bypass this error, in my scraper project settings.py
, I use
os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"
Howerver, according to the django documentation, this solution would be not convenient in in production environment as per documentation warnings. See this topic.
So, have you already met this issue and how have you solved it?