Introduction
This post serves as a companion to my repository, which provides an automated setup script for deploying a Redis cluster. Designed to streamline the process of configuring a clustered environment, the script enables users to quickly establish a functional Redis cluster with minimal manual intervention. In this guide, I’ll explain the purpose behind this project, detail the testing process for various configurations, and highlight key Redis capabilities uncovered during development. Additionally, I’ll clarify the differences between standalone and clustered Redis to provide context for when clustering is advantageous.
Purpose of This Project
The motivation for this work stems from my interest in distributed systems and resilient architectures. Redis is renowned for its performance as an in-memory data store, but its clustering feature offers a pathway to scalability and fault tolerance that standalone instances cannot achieve. This project is an experiment in harnessing Redis clustering to distribute data across nodes, manage failures, and simplify deployment through automation. By creating this setup, I aimed to explore Redis’ advanced features—such as sharding, replication, and dynamic scaling—while ensuring the process remains accessible and repeatable.
Standalone vs. Clustered Redis: Key Differences
Redis can operate in two primary modes—standalone and clustered—each suited to different use cases:
Standalone Redis
Description: A single instance hosting the entire dataset in memory.
Advantages: Simple to configure (start with redis-server), offers peak performance for moderate workloads, and avoids network latency.
Limitations: Constrained by the memory and processing capacity of a single machine. Without replication or persistence, a failure results in data loss or downtime.
Use Case: Ideal for small applications or caching where simplicity and low latency are priorities.
Clustered Redis
Description: Multiple nodes sharing the dataset, with data sharded across 16,384 slots and optional replication for redundancy.
Advantages: Scales horizontally by adding nodes and handles larger datasets and traffic.
Limitations: Introduces complexity in setup and management, along with potential network overhead between nodes.
Use Case: Suited for large-scale applications requiring fault tolerance, distributed data, and sustained performance under heavy load.
Testing Configurations and Exploring Redis Capabilities
To assess the automated setup script and probe Redis’ clustering features, I conducted targeted tests comparing throughput and resilience across different configurations. The experiments utilized containerized Redis instances. Below are the setups, methodologies, and findings:
Standalone vs. Clustered Throughput
Configuration: A standalone Redis instance (single container) versus a 6-node clustered setup (three master containers, 1 replica each).
Test: Executed 1,000,000 write operations (SET keyN valueN) on each setup using a benchmarking script, measuring completion time and throughput (operations per second).
Vertx:
Standalone mode: 7.22 seconds with 138k/sec throughput
Clustered mode: 13.33 seconds and 75k/sec throughput
PHP:
Standalone: 6.7 seconds and 150k/sec throughput
Clustered mode: 13.72 seconds and 73k/sec throughput
Outcome
The standalone instance achieved higher throughput for small datasets due to zero network overhead, completing the writes faster. The clustered setup, while slower per operation due to slot hashing and inter-node communication, demonstrated inferior performance for 1M operations.
Having investigated a bit more on the topic it makes sense to set up clustered Redis if the volume of operations reaches around 100k or the data doesn't fit into memory of a single instance. Otherwise standalone Redis provides better performance. That calculations could also be influenced by other factors like network latency where cluster can parallelize operations. However load for reads can also be distributed with replication which is possible in both standalone and cluster setups.
Insight: Standalone Redis excels for low-latency, single-node workloads, whereas clustering provides scalability for larger datasets and higher concurrency.