Django Elastic Search - How to improve Big Data Indexing speed in ES
I've Epsiode Index where I've only 1 field which I used for searching. I've 53 Millions of records in PostgreSQL. I used django-elasticsearch-dsl lib. for elastic search engine. The problem is When I run the cmd to dump the PostgreSQL episodes table into Episode Index it almost takes 5-6 hours. How can I overcome this problem. Its a bottleneck for me to deploy on prod.
documents.py
search_analyzer = analyzer('search_analyzer',
filter=["lowercase"],
tokenizer=tokenizer('autocomplete', 'edge_ngram', min_gram=1, max_gram=15))
@registry.register_document
class EpisodeDocument(Document):
title = fields.TextField(analyzer=search_analyzer)
class Index:
name = 'episode'
settings = {
'number_of_shards': 4,
'number_of_replicas': 2,
'max_ngram_diff': 10
}
class Django:
model = Episode
queryset_pagination = 5000
cmd for Data dump into ES
python manage.py search_index --rebuild
ES machine: t3.2xlarge