Have you ever been working on a feature and come to a point where you just feel barricaded in on all sides? This recently happened to me. Trying to force the implementation with the current structure of our schema felt dirty. The code I was writing became increasingly hard to follow. Not to mention, if and when a bug were to rear its ugly head, how difficult solving this conglomerate hodgepodge of spaghetti code would be. We needed to restructure parts of our schema with the added complexity of zero downtime to our production environment.

What is Hiding in the Code?

Lurking in the shadows of the code was “technical debt.” Technical debt is a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution. I was on a highway to more technical debt. Eradicating technical debt can be somewhat of a task. For us, we needed to restructure some of the most valuable data tables in our application and their related tables while keeping the servers running and customers happy.

The Implementation

Why not just take a maintenance window? First off it is not very user-friendly to take down your service periodically when you have the ability to leverage against taking down the application. Some companies like to take an outage during a weekend or evening, however, if you are distributed amongst other countries this may take away from the users in that market. All of your users are important, no matter where they are located. If you have done any research on migrating your application with zero downtime to production, you know it is no easy task. There are table locks to avoid, indexes that can be more difficult to work with, table renaming and data integrity/loss to avoid.

The application I am working on is a Rails App using MySQL, which lends an assortment of gems we could leverage for this job. We researched some gems and narrowed it down to rolling our own implementation or using the LHM; Large Hadron Migrator, though using a gem named after a machine that creates black holes kind of freaked me out. So, what did we choose? We chose to work through this without the use of a gem. When accomplishing a zero downtime migration, we wanted to be sure not to lose data. We took a number of steps and a few different pull requests to achieve our solution. Our First step is to create “new” tables for the schema we want, however, continue reading and writing to our “old” tables. We will not be reading from these newer tables quite yet. The idea here is to keep the application running as normal. After the “new tables” are in place, we wrote to both the “new” and “old” tables. This was a separate pull request to our jobs code base. This continues the pattern of keeping the application running as normal, without data becoming corrupt. The next part is to sync up data in each table. Since we are now recording data in our “new” and “old” tables, we need to sync any previously recorded data. I created a rake task with a large SQL insert statement to copy any and all previous data over to our “new” tables. At this point, everything is reading from our “old” tables and writing to both our “new” and “old” tables which keeps a copy of our data. Lastly, we removed reference to the “old” tables, essentially flipping a switch. Now that things are running smoothly in production referencing the tables and schema pattern we want, we can now drop our “old” tables.

Trade-Offs

Wow, that was a bit of work! Was it worth it? Our first and foremost thought was to keep the data intact every step of the way. It is possible to do all this with gems, but we felt more comfortable controlling our own destiny and keeping an eye on all the moving parts. An obvious trade-off is ease of use. In our implementation, we had a few more pull requests to navigate through. While using a gem can be pretty “easy” by plugging and playing, it may not be as flexible or fully documented. It also adds more dependencies to the system to manage. By not using the gem, we inherited some short term technical debt by continuing to write to “old” and “new” tables; however, I feel this is negligible since we delete the code that writes to the “old” tables as soon as we are production stable. We also were able to keep a copy of all our data in the “old” tables. If there was a fire, we could quickly switch back to the previous implementation and have zero data loss.

What I Learned

When you’re feeling boxed in, take a step back to look for opportunities to make your application more efficient and flexible. You might have some technical debt lurking around your code.