The Persistent Problem

The number one thing I hear when I suggest Redis as a database is, “Redis isn’t persistent.” This is because Redis is an in memory database, the data itself is not persisted to disk and therefore if the system goes down you can lose your data. Also by default entries in Redis have a time to live to ensure that Redis doesn’t run out of memory.

But this does not mean you cannot use or trust Redis as a reliable data set or that you can’t persist the data. You could theoretically use Redis like any other database and trust the data reliability if you set it up right.

A great example of this is Twitter, who uses Redis as their main database for their timeline feature on the site with terabytes of memory they are able to deliver to every user that logs on a list of recent tweets. While mostly used as a caching mechanism the timelines are built and stored in a Redis cluster and is the system of record for timelines. What this means is that it is downstream from the original postings but stores the data in a list for the user when they log in, therefore being a “persistent” database for timelines. I think in the end we may feel more comfortable using Redis for data stores for applicable areas.

Data has a time to live

This is true by default. Redis will allow a key to expire after a set period of time so that it can make room for new data. However there is the ability to PERSIST a key which removes the TTL essentially always keeping it in the database. Obviously this means that Redis won’t explicitly evict things from the database to make room so your database will only grow.

Okay yeah, but what if you run out of memory

This can be solved by clustering your environment which not only increases reliability of your system as a whole but also allows you to shard the data, which means to break it into pieces, and spread them out among the different servers in the cluster. At that point you’re essentially expanding the overall memory of the database as the whole cluster grows.

Alternatively you can increase the server memory itself expanding the overall storage capacity.

What’s important to note here is this should be a regular maintenance procedure that checks the current and expected capacity as you move forward but in general our datasets should be optimized before considering expansion.

But what if the whole cluster goes down

Redis has two built in ways to essentially rebuild your database if it were to go down. This article won’t go into detail but for reference purposes they are called RDB and AOF. These two processes store data as snapshots or appending to a file respectively.

What this means is we are able to rebuild a database and ensure our data is safe even if a failure were to occur again proving that the system can be reliable and robust if set up properly.

No Silver Bullet

Okay so we should move everything to Redis because of the speed advantages right? Wrong. There are obviously complexities and tradeoffs that were not addressed here. The point of this article was to overcome an initial bias most of us have towards Redis and start to fit it more into our architecture and stop saying is, “Redis isn’t persistent”.

So let’s look at three scenarios and determine if Redis is a good fit based on our research.

The Good

A great example would be Company Password Policy. This is a heavily read api endpoint that the data infrequently changes. It has a simple lookup by company id as the key which means we wouldn’t have any clustering issues. This dataset is also pretty lightweight and may not grow at an alarming rate.

The Bad

A bad example would be User Profile, an area in the product where we can see a user demographic information along with other features. The first issue with this is it is a very large dataset which can grow rapidly depending on the company. It also belongs to many different aspects of the product which means it is frequently queried and not accessed by just a key lookup which means that a clustered environment would be out. Due to its large dataset and complex nature User Profile should not be used.

And the Edge Case

A difficult example would be Company Licenses. Again this is a simple data structure that is frequently accessed yet can be queried on other systems. The size of the set does not often grow rapidly so cluster expansion shouldn’t be an issue. However, since this data is integral to many parts of our system we would need to ensure it is robust or be able to reconstruct the data in some way (event stream, other databases, etc). An argument could be made in either case but at the very least leveraging Redis as a cache would be beneficial.