Django Загрузите pdf, затем запустите скрипт для сканирования pdf и вывода результатов

Я пытаюсь создать веб-приложение Django, которое позволяет пользователю загружать pdf, затем скрипт сканирует его, выводит и сохраняет определенный текст, который скрипт сканировал.

Я смог найти некоторый код для выполнения части загрузки файла. У меня есть скрипт для сканирования pdf. Не уверен, как связать их вместе, чтобы выполнить эту задачу.

views.py

from django.shortcuts import redirect, render
from .models import Document
from .forms import DocumentForm

def my_view(request):
    print(f"Great! You're using Python 3.6+. If you fail here, use the right version.")
    message = 'Upload PDF'
    # Handle file upload
    if request.method == 'POST':
        form = DocumentForm(request.POST, request.FILES)
        if form.is_valid():
            newdoc = Document(docfile=request.FILES['docfile'])
            newdoc.save()

        # Redirect to the document list after POST
        return redirect('my-view')
    else:
        message = 'The form is not valid. Fix the following error:'
else:
    form = DocumentForm()  # An empty, unbound form

# Load documents for the list page
documents = Document.objects.all()

# Render list page with the documents and the form
context = {'documents': documents, 'form': form, 'message': message}
return render(request, 'list.html', context)

forms.py

from django import forms

class DocumentForm(forms.Form):
    docfile = forms.FileField(label='Select a file')

models.py

from django.db import models

class Document(models.Model):
    docfile = models.FileField(upload_to='documents/%Y/%m/%d')

list.html

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
<title>webpage</title>
    </head>
<body>
    <!-- Upload form. Note enctype attribute! -->
    <form action="{% url "my-view" %}" method="post" enctype="multipart/form-data">
        {% csrf_token %}
        {{ message }}
        <p>{{ form.non_field_errors }}</p>

        <!-- Select a file: text -->
        <p>{{ form.docfile.label_tag }} {{ form.docfile.help_text }}</p>

        <!-- choose file button -->
        <p>
            {{ form.docfile.errors }}
            {{ form.docfile }}
        </p>

        <!-- Upload button -->

        <p><input type="submit" value="Upload"/></p>
    </form>
</body>

Scrape.py
. Хотите вывести и сохранить Plan_Name.

import os
import pdfplumber
import re
directory = r'C:User/Ant_Esc/Desktop'

for filename in os.listdir(directory):
    if filename.endswith('.pdf'):
        fullpath = os.path.join(directory, filename)
        #print(fullpath)
        all_text = ""
        with pdfplumber.open(fullpath) as pdf:
            for page in pdf.pages:
                text = page.extract_text()
                #print(text)
                all_text += ' ' + text
                all_text = all_text.replace('\n','')
            pattern ='Plan Title/Name  .*? Program/Discipline'
            Plan_Name = re.findall(pattern, all_text,re.DOTALL)
            for i in Plan_Name:
                Plan_Name = i.removesuffix('Program/Discipline')
                Plan_Name = Plan_Name.removeprefix('Plan Title/Name  ')

Я просмотрел ваш код, можете ли вы подтвердить два следующих запроса? Мне кажется, что они отсутствуют.

Получаете ли вы какую-либо ошибку с приведенным выше кодом?
URL запись добавлена в URL.py
Откуда вы вызываете scrap.py?

Мое предложение: вы можете вызвать srap.py после успешного сохранения файла в view.py newdoc.save() или вы можете вызвать scrap.py из модели, используя метод super.

Дайте мне знать, если вам нужна дополнительная помощь по этому вопросу.

Вернуться на верх

Последние вопросы и ответы

Django on Azure App Service: got an unexpected keyword argument allow_abbrev

Fix django/nginx flacky 502 error: upstream prematurely closed

Django Admin not loading static files

How to create a virtualenv in the terminal of macOS?

Service selling platform

Why does StaticLiveServerTestCase breaks fixtures when dynamically generating tests beside TestCase does not?

HMR Module replacement is disabled

Django Allauth login/signup fails with SMTPAuthenticationError (535) in production

Why does using a set() snapshot for deduplication still allow duplicate records in my Django/Outlook integration?

Looking for Real-World Problems to Build a Web Application Around [closed]

Django Загрузите pdf, затем запустите скрипт для сканирования pdf и вывода результатов

Последние вопросы и ответы

Рекомендуемые записи по теме