Skip to main content

Choosing and Configuring Cache Backends

The caching framework in this codebase is designed to be pluggable, allowing developers to balance performance, persistence, and infrastructure complexity. By providing a unified interface through BaseCache, the system supports a variety of backends ranging from high-performance distributed stores like Redis and Memcached to simple local memory or file-based storage.

Redis: High Performance and Scalability

The RedisCache backend, implemented in django/core/cache/backends/redis.py, is the modern standard for production environments. It leverages the redis-py library and introduces a sophisticated RedisCacheClient to manage connections and data serialization.

Leader/Replica Strategy

One of the key design choices in RedisCacheClient is its built-in support for read-replication. When multiple servers are provided in the LOCATION setting, the client implements a specific routing logic in _get_connection_pool_index:

def _get_connection_pool_index(self, write):
# Write to the first server. Read from other servers if there are more,
# otherwise read from the first server.
if write or len(self._servers) == 1:
return 0
return random.randint(1, len(self._servers) - 1)

This ensures that all write operations (like set, add, or delete) are directed to the primary server (index 0), while read operations (like get or has_key) are distributed randomly across the remaining replica servers.

Serialization

Unlike some other backends that rely solely on Python's pickle, RedisCacheClient uses a dedicated RedisSerializer. This allows for custom serialization logic, though it defaults to pickle with the highest protocol for efficiency.

Memcached: Distributed Memory Caching

The Memcached implementation in django/core/cache/backends/memcached.py provides two primary flavors: PyMemcacheCache (using pymemcache) and PyLibMCCache (using the C-based pylibmc). Both inherit from BaseMemcachedCache, which handles several protocol-specific quirks.

The 30-Day Timeout Quirk

Memcached treats timeouts greater than 30 days (2,592,000 seconds) as Unix timestamps rather than relative offsets. BaseMemcachedCache.get_backend_timeout transparently handles this by converting long durations into absolute timestamps:

if timeout > 2592000:  # 30 days
# Memcached interprets values > 30 days as Unix timestamps.
timeout += int(time.time())

Handling the 1MB Limit

Memcached has a hard limit of 1MB for any single cached item. If a set operation fails (often due to this limit), the backend proactively deletes the key to prevent stale data from persisting if an older, smaller version of the key already existed:

def set(self, key, value, timeout=DEFAULT_TIMEOUT, version=None):
key = self.make_and_validate_key(key, version=version)
if not self._cache.set(key, value, self.get_backend_timeout(timeout)):
# Make sure the key doesn't keep its old value in case of failure
self._cache.delete(key)

Database and File-based Persistence

For environments where external services like Redis are unavailable, DatabaseCache and FileBasedCache provide persistent storage using existing infrastructure.

Database Caching

DatabaseCache (in django/core/cache/backends/db.py) stores data in a dedicated table. It is particularly useful because it respects Django's database routing, allowing cache traffic to be isolated to a specific database.

A critical aspect of this backend is its manual culling process. Since databases do not natively expire rows like Redis does, DatabaseCache performs culling during set operations if the MAX_ENTRIES limit is reached. It uses a combination of expiring old keys and a "cull frequency" algorithm to maintain table size:

def _cull(self, db, cursor, now, num):
if self._cull_frequency == 0:
self.clear()
else:
# First, delete truly expired entries
cursor.execute("DELETE FROM %s WHERE %s < %%s" % (table, expires_col), [now])
# If still over limit, delete a fraction of remaining entries
if remaining_num > self._max_entries:
cull_num = remaining_num // self._cull_frequency
# ... logic to delete oldest keys ...

File-based Caching

FileBasedCache (in django/core/cache/backends/filebased.py) stores each cache entry as a separate file. While simple, it can suffer from performance degradation on filesystems with many small files. It uses zlib compression to reduce disk usage and pickle for serialization.

Warning: Because it uses pickle.load(), the LOCATION directory must never be web-accessible to prevent arbitrary code execution vulnerabilities.

Development and Testing Backends

Local Memory Cache

LocMemCache (in django/core/cache/backends/locmem.py) is the default backend if none is configured. It is thread-safe, using a threading.Lock to protect its internal OrderedDict storage. However, it is per-process. In a multi-process production environment (like Gunicorn with multiple workers), each worker will have its own isolated cache, making it unsuitable for production.

Dummy Cache

DummyCache (in django/core/cache/backends/dummy.py) implements the cache interface but performs no actual storage. It is invaluable for development or testing environments where you want to verify that caching logic is called without actually persisting data or dealing with expiration.

Summary of Culling Strategies

Backends that manage their own storage limits (LocMemCache, DatabaseCache, FileBasedCache) share a common configuration pattern:

  • MAX_ENTRIES: The threshold at which culling begins (default 300).
  • CULL_FREQUENCY: The reciprocal of the fraction of entries to remove. For example, a value of 3 means 1/3 of the entries are removed when the limit is hit. A value of 0 results in a full cache wipe.

In contrast, RedisCache and Memcached backends delegate eviction and expiration to the underlying service, which typically uses LRU (Least Recently Used) or similar algorithms.