Choosing and Configuring Cache Backends
The caching framework in this codebase is designed to be pluggable, allowing developers to balance performance, persistence, and infrastructure complexity. By providing a unified interface through BaseCache, the system supports a variety of backends ranging from high-performance distributed stores like Redis and Memcached to simple local memory or file-based storage.
Redis: High Performance and Scalability
The RedisCache backend, implemented in django/core/cache/backends/redis.py, is the modern standard for production environments. It leverages the redis-py library and introduces a sophisticated RedisCacheClient to manage connections and data serialization.
Leader/Replica Strategy
One of the key design choices in RedisCacheClient is its built-in support for read-replication. When multiple servers are provided in the LOCATION setting, the client implements a specific routing logic in _get_connection_pool_index:
def _get_connection_pool_index(self, write):
# Write to the first server. Read from other servers if there are more,
# otherwise read from the first server.
if write or len(self._servers) == 1:
return 0
return random.randint(1, len(self._servers) - 1)
This ensures that all write operations (like set, add, or delete) are directed to the primary server (index 0), while read operations (like get or has_key) are distributed randomly across the remaining replica servers.
Serialization
Unlike some other backends that rely solely on Python's pickle, RedisCacheClient uses a dedicated RedisSerializer. This allows for custom serialization logic, though it defaults to pickle with the highest protocol for efficiency.
Memcached: Distributed Memory Caching
The Memcached implementation in django/core/cache/backends/memcached.py provides two primary flavors: PyMemcacheCache (using pymemcache) and PyLibMCCache (using the C-based pylibmc). Both inherit from BaseMemcachedCache, which handles several protocol-specific quirks.
The 30-Day Timeout Quirk
Memcached treats timeouts greater than 30 days (2,592,000 seconds) as Unix timestamps rather than relative offsets. BaseMemcachedCache.get_backend_timeout transparently handles this by converting long durations into absolute timestamps:
if timeout > 2592000: # 30 days
# Memcached interprets values > 30 days as Unix timestamps.
timeout += int(time.time())
Handling the 1MB Limit
Memcached has a hard limit of 1MB for any single cached item. If a set operation fails (often due to this limit), the backend proactively deletes the key to prevent stale data from persisting if an older, smaller version of the key already existed:
def set(self, key, value, timeout=DEFAULT_TIMEOUT, version=None):
key = self.make_and_validate_key(key, version=version)
if not self._cache.set(key, value, self.get_backend_timeout(timeout)):
# Make sure the key doesn't keep its old value in case of failure
self._cache.delete(key)
Database and File-based Persistence
For environments where external services like Redis are unavailable, DatabaseCache and FileBasedCache provide persistent storage using existing infrastructure.
Database Caching
DatabaseCache (in django/core/cache/backends/db.py) stores data in a dedicated table. It is particularly useful because it respects Django's database routing, allowing cache traffic to be isolated to a specific database.
A critical aspect of this backend is its manual culling process. Since databases do not natively expire rows like Redis does, DatabaseCache performs culling during set operations if the MAX_ENTRIES limit is reached. It uses a combination of expiring old keys and a "cull frequency" algorithm to maintain table size:
def _cull(self, db, cursor, now, num):
if self._cull_frequency == 0:
self.clear()
else:
# First, delete truly expired entries
cursor.execute("DELETE FROM %s WHERE %s < %%s" % (table, expires_col), [now])
# If still over limit, delete a fraction of remaining entries
if remaining_num > self._max_entries:
cull_num = remaining_num // self._cull_frequency
# ... logic to delete oldest keys ...
File-based Caching
FileBasedCache (in django/core/cache/backends/filebased.py) stores each cache entry as a separate file. While simple, it can suffer from performance degradation on filesystems with many small files. It uses zlib compression to reduce disk usage and pickle for serialization.
Warning: Because it uses pickle.load(), the LOCATION directory must never be web-accessible to prevent arbitrary code execution vulnerabilities.
Development and Testing Backends
Local Memory Cache
LocMemCache (in django/core/cache/backends/locmem.py) is the default backend if none is configured. It is thread-safe, using a threading.Lock to protect its internal OrderedDict storage. However, it is per-process. In a multi-process production environment (like Gunicorn with multiple workers), each worker will have its own isolated cache, making it unsuitable for production.
Dummy Cache
DummyCache (in django/core/cache/backends/dummy.py) implements the cache interface but performs no actual storage. It is invaluable for development or testing environments where you want to verify that caching logic is called without actually persisting data or dealing with expiration.
Summary of Culling Strategies
Backends that manage their own storage limits (LocMemCache, DatabaseCache, FileBasedCache) share a common configuration pattern:
- MAX_ENTRIES: The threshold at which culling begins (default 300).
- CULL_FREQUENCY: The reciprocal of the fraction of entries to remove. For example, a value of
3means 1/3 of the entries are removed when the limit is hit. A value of0results in a full cache wipe.
In contrast, RedisCache and Memcached backends delegate eviction and expiration to the underlying service, which typically uses LRU (Least Recently Used) or similar algorithms.