Django Elastic Search - How to improve Big Data Indexing speed in ES

I've Epsiode Index where I've only 1 field which I used for searching. I've 53 Millions of records in PostgreSQL. I used django-elasticsearch-dsl lib. for elastic search engine. The problem is When I run the cmd to dump the PostgreSQL episodes table into Episode Index it almost takes 5-6 hours. How can I overcome this problem. Its a bottleneck for me to deploy on prod.

documents.py

search_analyzer = analyzer('search_analyzer',
                           filter=["lowercase"],
                           tokenizer=tokenizer('autocomplete', 'edge_ngram', min_gram=1, max_gram=15))

@registry.register_document
class EpisodeDocument(Document):
    title = fields.TextField(analyzer=search_analyzer)

    class Index:
        name = 'episode'
        settings = {
            'number_of_shards': 4,
            'number_of_replicas': 2,
            'max_ngram_diff': 10
        }

    class Django:
        model = Episode
        queryset_pagination = 5000

cmd for Data dump into ES

python manage.py search_index --rebuild

ES machine: t3.2xlarge

Back to Top