Partitioning of Key-Value Data in Cache Systems (Part 1)

18 June 2025, 4 min read

Cache hero image

This is the opening post in a series focused on caching strategies for distributed systems, especially those handling large volumes of data and heavy traffic. In this article, we delve into cache partitioning—what it is, why it matters, and how to implement it efficiently in a Kubernetes setup. In the next post, we’ll walk through a practical example that demonstrates configuring Memcached, Ketama consistent hashing, and Twemproxy in a Kubernetes environment.

When to use cache partitioning

Replicating cache data is a widely used approach to achieve high availability and fault tolerance. It involves storing identical data across multiple cache nodes (replicas), which improves read performance and ensures redundancy in case a node fails. However, for applications with large datasets or high traffic, replication alone may not be enough. That’s where cache partitioning becomes essential.

Cache Replication diagram

Cache partitioning involves breaking the cached data into smaller, more manageable segments (partitions) and distributing them across several cache nodes (shards). Each shard holds only a portion of the entire dataset, enabling the system to scale both read and write operations across multiple instances. Since each shard uses its own memory, disk, and CPU resources, this leads to more efficient resource usage and improved performance. The primary goal of cache partitioning is to enhance scalability.

Cache Partitioning diagram

This approach is often used in distributed systems, where multiple application instances share the cache while remaining stateless and decoupled. However, it can also be valuable in monolithic systems that require high-performance or large-capacity caching within a single application instance.

How to distribute cache keys across multiple instances

In an ideal scenario, each cache instance would manage an equal portion of the total cache data, ensuring efficient use of resources and consistent performance. Theoretically, as you add more cache instances, overall performance should scale linearly—doubling the instances should double the throughput. However, in real-world systems, achieving this perfect balance is difficult. Disparities in data distribution often lead to uneven load across cache nodes, a problem known as cache hotspots.

Cache Hotspots diagram

When cache keys aren't evenly distributed, some instances can become overloaded while others remain underutilized. This imbalance causes performance issues such as higher latency, lower throughput, and reduced system efficiency. A common partitioning method, mod-N hashing, which uses the modulo operation to assign keys - can create skewed distributions when the key space isn't uniform, concentrating traffic on certain instances.

To address this, consistent hashing offers a more effective solution. It arranges cache instances on a virtual hash ring and assigns keys in a way that balances the load more evenly, regardless of the key space. This method significantly reduces the disruption caused by adding or removing cache instances, since only a small number of keys need to be reassigned. You can read more about consistent hashing in the Consistent hashing algorithm.

Consistent Hashing diagram

Consistent hashing is a key technique used in distributed caching systems like Memcached and Redis, often implemented via tools like Ketama, to support efficient and scalable cache partitioning.

How to decouple cache partitioning from application logic

The application’s logic should remain decoupled from the cache partitioning mechanism. In other words, the application shouldn’t need to know how the cache is segmented or how keys are distributed among instances. Instead, the caching layer should abstract away these details, presenting a unified interface that makes the cache appear as a single, cohesive system.

Cache Partitioning Abstraction

This abstraction can be achieved using a client library that supports consistent hashing - such as Ketama for Memcached - or by introducing a proxy layer between the application and the cache. The proxy-based approach provides greater flexibility and scalability, as it can dynamically adapt to changes in the cache topology (e.g., scaling nodes up or down) without requiring any modifications to the application code. It also enables switching partitioning strategies seamlessly, making it easier to evolve the system in response to shifting performance demands.

In containerized environments like Kubernetes, this caching layer can be implemented as a sidecar container that runs alongside the application container. The sidecar manages all partitioning and communication with the cache, exposing a simple and consistent interface to the application. This design pattern, known as the ambassador pattern, allows the sidecar to serve as a dedicated cache handler, enabling the application to remain agnostic of the underlying cache architecture.

Cache Sidecar Pattern

In the upcoming article, we’ll explore how to build a caching layer using Twemproxy (also known as Nutcracker), a widely-used open-source proxy for Memcached and Redis that enables consistent hashing and cache partitioning. We'll walk through deploying Twemproxy in a Kubernetes environment and configuring it to work seamlessly with Memcached and Ketama hashing.


About the authors

Maciej Laskowski

Maciej Laskowski - software architect with deep hands-on experience. Continuous Delivery evangelist, architecture trade offs analyst, cloud-native solutions enthusiast.

Tomasz Michalak

Tomasz Michalak - a hands-on software architect interested in TDD and DDD who translates engineering complexity into the language of trade-offs and goals.

© 2025, Copyright ©HandsOnArchitects.com