Django: django.core.exceptions.SynchronousOnlyOperation while running scrapy script in django

I am trying to implement scrapy in django. For that this topic helped me.

In my script I just return a simple object in order to see if everything work in order to be added in my model. I don't scrap any website.

  1. Issue

myspider.py:

from scrapers.items import ScrapersItem 



class ErascraperSpider(scrapy.Spider):
    name = "erascraper"
    allowed_domains = ["example.com"]
    start_urls = ["https://example.com"]


    def parse(self, response):
        return ScrapersItem(name="Argus")

mypipeline.py:

class ScrapersPipeline(object):
     def process_item(self, item, spider):
        item.save()
        print("pipeline ok")
        return item

Also, I use Scrapy_DjangoItem in my items.py.

Howerver I get this error:

2024-05-21 22:01:37 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://example.com> (referer: None)
2024-05-21 22:01:37 [scrapy.core.scraper] ERROR: Error processing {'name': 'Argus'}
Traceback (most recent call last):
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/twisted/internet/defer.py", line 1078, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy/utils/defer.py", line 340, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/scrapers/scrapers/pipelines.py", line 14, in process_item
    item.save()
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/scrapy_djangoitem/__init__.py", line 35, in save
    self.instance.save()
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 822, in save
    self.save_base(
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 909, in save_base
    updated = self._save_table(
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1071, in _save_table
    results = self._do_insert(
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/base.py", line 1112, in _do_insert
    return manager._insert(
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/manager.py", line 87, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/query.py", line 1847, in _insert
    return query.get_compiler(using=using).execute_sql(returning_fields)
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/db/models/sql/compiler.py", line 1821, in execute_sql
    with self.connection.cursor() as cursor:
  File "/Users/kevingoncalves/Desktop/Folders/Coding/glsapi/myenv/lib/python3.12/site-packages/django/utils/asyncio.py", line 24, in inner
    raise SynchronousOnlyOperation(message)
django.core.exceptions.SynchronousOnlyOperation: You cannot call this from an async context - use a thread or sync_to_async.
  1. Solution

I read a lot of things about solution especially using sync_to_sync and await but I don't see where I could use it in myscript

Now, in order to bypass this error, in my scraper project settings.py, I use

os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"

Howerver, according to the django documentation, this solution would be not convenient in in production environment as per documentation warnings. See this topic.

So, have you already met this issue and how have you solved it?

Back to Top