Django: Parse the docx and get content from document
I'm writing Django app where user can upload the .docx file and get flashcards done in db from parsed document.
Flashcards are created from different heading types, like below:
Heading title: Category of flashcards, Heading 1: Title of flashcard, Heading normal: Content of flashcard
The problem is when I upload word document. Category is created, but title and content are empty strings..
I did separate python file with the same func and there everything is working: category, title and content are exactly how it should be.
Input is the same in both cases (Flask.docx file):
Flask Framework (using heading Title, so it should be the category)
Flask (using Heading 1, so it should be the title of a flashcard)
Is a Python Web framework. (using heading normal, so it should be the content of a flashcard)
Django:
- Models:
from django.db import models
# Create your models here.
class FlashcardCategory(models.Model):
id = models.IntegerField(primary_key=True)
name = models.CharField(max_length=30, unique=True, help_text="Category of flashcards.")
def __str__(self):
return self.name
class Flashcard(models.Model):
id = models.IntegerField(primary_key=True)
category = models.ForeignKey(FlashcardCategory, on_delete=models.CASCADE)
title = models.CharField(max_length=50, help_text="Title of flashcard.")
content = models.TextField(unique=True, help_text="Content (reverse) of flashcard.")
def __str__(self):
return f"{self.title} - {self.content}"
- View:
def upload_documents_and_parse(request):
"""
Function based view where user can upload his/hers document (only .docx).
Then app is going to try to parse the document and create flashcards.
If something goes wrong, app is going to inform user.
"""
if request.method == "POST":
form = WordDocumentForm(request.POST, request.FILES)
if form.is_valid():
uploaded_document = request.FILES['document']
document_to_parse = Document(uploaded_document)
category = str()
title = str()
content = str()
for paragraph in document_to_parse.paragraphs:
if paragraph.style.name == "normal":
content = paragraph.text
elif paragraph.style.name == "Heading 1":
title = paragraph.text
elif paragraph.style.name == 'Title':
category = paragraph.text
flashcard_category_instance = FlashcardCategory(name=category)
flashcard_category_instance.save()
flashcard_instance = Flashcard(category=FlashcardCategory.objects.get(name=category),
title=title,
content=content)
flashcard_instance.save()
return HttpResponse(f"Category: {category}, title: {title}, content: {content}")
return redirect('home_page')
else:
HttpResponse("Something went wrong. Try again.")
else:
form = WordDocumentForm()
return render(request, "upload_and_parse.html", {"form": form})
Output of that django view (after upload file):
- Category: Flask Framework,
- Title:
- Content:
Simple Python func (outside Django project):
from docx import Document
def parser(path):
docs = Document(path)
title = str()
heading = str()
text = str()
for paragraph in docs.paragraphs:
if paragraph.style.name == 'Title':
title = paragraph.text
elif paragraph.style.name == "Heading 1":
heading = paragraph.text
elif paragraph.style.name == "normal":
text = paragraph.text
return f"Title: {title}\nHeading: {heading}\nTitle: {text}"
And output from that func (which is correct and Django should do the same thing and should save it in db):
- Title: Flask Framework
- Heading: Flask
- Title: Is a Python Web framework.
Can anybody help please?