How many keys can I use store in Django file-based cache before it becomes a performance bottleneck?
I'm working with a large number of small-sized data entries (typically 2–3 KB each) and I'm using Django's file-based cache backend for storage. I would like to understand the scalability limits of this approach. Specifically:
Is there a practical or recommended limit to the number of cache keys the file-based backend can handle efficiently?
At what point (number of keys or total cache size) might I start seeing performance degradation or bottlenecks?
Are there any known issues or filesystem-level constraints that I should be aware of when caching tens or hundreds of thousands of small files?
I'm open to alternative caching strategies if the file-based backend is not well-suited for this use case.
What is the most suitable Django cache backend for storing a high volume of small entries (possibly tens or hundreds of thousands)?
TL;DR: Your needs are so "modest" that choice of cache doesn't matter - any cache will do just fine, including file-based caches. But in theory, though at this scale you won't notice any difference in userland, memory caching is better.
If you're using something like diskcache, you ought to be able to scale to millions of entries at the very least before you start seeing micro-observable performance impact. Number of entries is also the significant factor, not total cache size, as it's the lookup scan that will scale as the cache grows.
Disk I/O is also what will become the limiting factor before anything else when working with file caches, so the practical limits will in big part depend on what hardware you're working with.
That said, memory caches are, given sufficient resources - RAM is significantly more expensive than disk volume - much better at scaling performance-wise. The "industry standard" so to speak is Redis, and Django has excellent capabilities to use that. Though, MemCached is also quite capable.
If you're not even running multiple instances, using the default LocMem (local memory) cache also works just fine so long as the host machine has sufficient RAM to run both your application and whatever size of cache you need.