To Shard or not to Shard, that is the question…

A database shard is the horizontal partition of data in a database, where rows of a database table are held separately, rather than columns. Each partition is known as a shard and each shard is held on a separate database server instance, to spread load.

Advantages - The total number of rows in each table is reduced. This reduces the index size and improves performance. Since multiple shards can be placed on separate hardware, this enables distribution of database, spreading out the load and hence greatly improves performance.

Disadvantages - This demands a heavier reliance on the interconnection between servers, which increases latency when querying more than one shard. Due to more complex failure modes of a set of servers, sharding leads to inconsistency and low durability.

What this all boils down to… Sharding allows multiple databases in the same rails application.

We at ThreatSim have recently started utilizing database shards. Using the Octopus Gem for real-world segmentation of the data (:eu shard for European and :master shard for American), so it is easier to infer the appropriate shard, and query only the relevant shard.

Database Sharding in Active Records using Octopus

Octopus allows Database Sharding. So we send some queries to specific shards, or pick one shard, and use it as the default database for an application.

Octopus supports:

  • Sharding (with multiple shards, and grouped shards)
  • Replication (Master/slave support, with multiple slaves)
  • Moving data between shards with migrations
  • Tools to manage database configurations

For this blog post we will be talking about database sharding using Octopus.

Installing Octopus

Installing Octopus is as easy as installing any other gem to your rails application. Just add to your Gemfile:

gem 'ar-octopus'

Config File

To enable a shard, we need to create a file named shards.yml in our config directory inside our rails application. To use octopus in a plain ruby application, just create a folder named config, and put the file inside it. The shards.yml file should looks like this:

octopus:
  environments:
    - development
    - production
  development:
    shard_one:
      host: localhost
      adapter: mysql
      database: app_development
  production:
    shard_one:
      host: localhost
      adapter: mysql
      database: app_production

This is an example of a config file using rails. Here we have one shard for each environment, named shard_one, a shard for development, and another for production. We need to specify what environments octopus will run, with the ‘environments’ tag. All configs should start with the ‘octopus’ tag. Octopus also works with groups of shards. With groups, we can send queries to all members of a group.

So you’re all setup, but what does this actually look like? When you shard your database, it’s like separating all your data in to little silos. These little silos have no knowledge of other silos of data.

“I have no idea how the carrot tastes”, says cucumber.

When using sharding, we need to specify which shard to send the query. Octopus supports selecting the shard inside a controller, or manually in each object.

To send a query to a specific shard:

# This will create a user with name the 'Smith' in `:eu` shard.
User.using(:eu).create!(name: 'Smith')

#using method is also available for instances:

class User < ActiveRecord::Base
  def awesome_queries
    Octopus.using(:eu) do
      User.create(:name => "Mickey")
      Dog.create(:name => "Pluto")
    end
  end
end

# This will create a new user and a new dog in the `:eu` shard:
u = User.new
u.awesome_queries()

The real power in Octopus for Active Record comes in the form of blocks. What is nice about the Octopus #using method is the ability to wrap queries inside a block.

Octopus.using(:posts) do
  Post.create(name: "MyFirstPost")
  # Do more database updating, creating, or destroying
end

Using this kind of block we can modify it to fit a controller for all its actions or even a specific action. Take the following code as an example.

class ApplicationController < ActionController::Base
  around_filter :select_shard      

 def select_shard(&block)
    Octopus.using(current_user.data_location, &block)
  end
end

Now all your Controller requests will pull data from the selected shard.

We can also chain Octopus scopes together. Example, say we have two databases. One for storing all our Users and one for storing their blog post. If we put our Users in one shard(:mastershard) and the Posts in another(:post shard) we can chain down to get a users post.

User.using(:master).first.using(:posts).posts.first`

Migrations with Octopus

We were thinking of keeping the posts strictly for sharding but without knowing migrations, we think sharding is some what incomplete. In migrations also we have access to the using method:

# This will create migration to add 'data_location' column to 'accounts' table in both :master and :eu shards
class AddDataLocation < ActiveRecord::Migration
  using(:master, :eu)
  def change
    add_column :accounts, :data_location, :string, :default => "us"
  end
end

Problems we faced

With this being said, if you plan to separate your data you may run into some interesting problems.

Database relationships are vital to your application, however, when you shard your database you may end up with unfavorable effects depending on how you plan to separate you data. These relationships may become problematic for your separated databases when one object needs another through a belongs_to or has_many relation between two shards. Let’s look at an Example.

Imagine we have a Users table, this table has a column of creator, which is just another User. Users.first.creator Let’s assume this creator is stored in our :master or :us database. If we wanted to add a new user to an :eu database, the first user in our :master shard will create this new user.

def create
  #current user from :master shard
  current_user.create_new_user(:eu)
end

If we ask for the creator, User.using(:eu).first.creator we get back nil.

The :eu database doesn’t know how to find the correct user from a different database shard. However, if we had a user with the same :id as the creator in our :eu database, we would get back the wrong user. To avoid such problems you can add a :uuid to your Users Table and query each shard until you find the correct user matching the unique id. However, this approach is not very performant. But this doesn’t mean all hope is lost. We are sure we will soon find an efficient solution to this and be back!

Thanks for taking a look, I hope you found this post helpful!