What will happen if the reported size of a streaming download is inaccurate?
I implemented a download view in my django project that builds a zip archive and streams it on the fly.
The files included in the archive are 1 tsv
file and any number of xml
(from a set of search results) that are organized into a series of directories.
The download works, but there is no progress. We have tested a small download (47Mb) and a large one (3Gb).
I was thinking that it would be nice to have a progress bar to give the user some idea of how long the download will take, however, from what I've read, predicting the size of a zip file is tricky/prone-to-inaccuracy, so I'm wondering (since I'm very inexperienced with zip file generation [let alone streaming downloads])...
- What would happen with a download from a user perspective if I supply an estimated size? Specifically, would a download fail?
- What would happen if the estimated size is too big?
- What would happen if the estimated size is too small?
Are there any alternate solutions for this problem space that I should consider?
To have a progress, you need to send Content-Length header in the response and you can't send that with streaming requests as you don't know the exact size of the response before start streaming.
OK, so what happens if we estimate Content-Length:
- If it is below the real length, the request will be terminated early as from the browser's point of view, all data is received.
- If the value is higher than the real length, then the browser will keep waiting for the content that will never be received as it isn't there.
The solution is to do all the work first on the server, so you send the file all at once (with the content length set probably), but for sure, it can cause Gateway Timeout, if you are compressing for a long time.
It is in fact possible to predict the length of a Zip archive ahead of time. If :
- you know the file sizes and names of all the input files,
- you disable compression,
- you know exactly how the Zip library interprets the Zip format.
I've done that in client-zip precisely to enable progress bars and remove the need for chunked encoding. I don't know if there's anything equivalent in Python (or anywhere else).