To Shard or not to Shard, that is the question…
A database shard is the horizontal partition of data in a database, where rows of a database table are held separately, rather than columns. Each partition is known as a shard and each shard is held on a separate database server instance, to spread load.
Advantages - The total number of rows in each table is reduced. This reduces the index size and improves performance. Since multiple shards can be placed on separate hardware, this enables distribution of database, spreading out the load and hence greatly improves performance.
Disadvantages - This demands a heavier reliance on the interconnection between servers, which increases latency when querying more than one shard. Due to more complex failure modes of a set of servers, sharding leads to inconsistency and low durability.
What this all boils down to… Sharding allows multiple databases in the same rails application.
We at ThreatSim have recently started utilizing database shards. Using the Octopus Gem for real-world segmentation of the data (
:eu shard for European and
:master shard for American), so it is easier to infer the appropriate shard, and query only the relevant shard.
Database Sharding in Active Records using Octopus
Octopus allows Database Sharding. So we send some queries to specific shards, or pick one shard, and use it as the default database for an application.
- Sharding (with multiple shards, and grouped shards)
- Replication (Master/slave support, with multiple slaves)
- Moving data between shards with migrations
- Tools to manage database configurations
For this blog post we will be talking about database sharding using Octopus.
Installing Octopus is as easy as installing any other gem to your rails application. Just add to your Gemfile:
To enable a shard, we need to create a file named shards.yml in our config directory inside our rails application. To use octopus in a plain ruby application, just create a folder named config, and put the file inside it. The shards.yml file should looks like this:
This is an example of a config file using rails. Here we have one shard for each environment, named shard_one, a shard for development, and another for production. We need to specify what environments octopus will run, with the ‘environments’ tag. All configs should start with the ‘octopus’ tag. Octopus also works with groups of shards. With groups, we can send queries to all members of a group.
So you’re all setup, but what does this actually look like? When you shard your database, it’s like separating all your data in to little silos. These little silos have no knowledge of other silos of data.
“I have no idea how the carrot tastes”, says cucumber.
When using sharding, we need to specify which shard to send the query. Octopus supports selecting the shard inside a controller, or manually in each object.
To send a query to a specific shard:
#using method is also available for instances:
The real power in Octopus for Active Record comes in the form of blocks. What is nice about the Octopus
#using method is the ability to wrap queries inside a block.
Using this kind of block we can modify it to fit a controller for all its actions or even a specific action. Take the following code as an example.
Now all your Controller requests will pull data from the selected shard.
We can also chain Octopus scopes together. Example, say we have two databases. One for storing all our Users and one for storing their blog post. If we put our Users in one shard(
:mastershard) and the Posts in another(
we can chain down to get a users post.
Migrations with Octopus
We were thinking of keeping the posts strictly for sharding but without knowing migrations, we think sharding is some what incomplete. In migrations also we have access to the using method:
Problems we faced
With this being said, if you plan to separate your data you may run into some interesting problems.
Database relationships are vital to your application, however, when you shard your database you may end up with unfavorable effects depending on how you plan to separate you data. These relationships may become problematic for
your separated databases when one object needs another through a
has_many relation between two shards. Let’s look at an Example.
Imagine we have a Users table, this table has a column of creator, which is just another User.
Users.first.creator Let’s assume this creator is stored in our
:us database. If we wanted to add a new user to an
:eu database, the first user in our
:master shard will create this new user.
If we ask for the creator,
User.using(:eu).first.creator we get back
:eu database doesn’t know how to find the correct user from a different database shard. However, if we had a user with the same
:id as the creator in our
:eu database, we would get back the wrong user. To avoid such problems you can add a
:uuid to your Users Table and query each shard until you find the correct user matching the unique id. However, this approach is not very performant. But this doesn’t mean all hope is lost. We are sure we will soon find an efficient solution to this and be back!
Thanks for taking a look, I hope you found this post helpful!