I am not a Regex master by any means, but I do get a little gleeful when I see it. Sometimes it’s downright fun. If you want to see what I mean, try these Regex Crossword puzzles.

Regex’s matching capability is well known. For instance, we can search for patterns within strings.

/gum/.match("gumby")
=> #<MatchData "gum">

Used with gsub, replacing any pattern is easy peasy.

"Easy as pie.".gsub(/as\spie/, "peasy")
=> "Easy peasy."

But what if you want to insert instead of replace? Say we have a variable that we want to convert from camel case to underscore format. Let’s see how to go from babySloth to baby_sloth.

Looking for capital letters makes sense. This first step isn’t necessary, but it demonstrates how to capture a Regex match for use later on. To create a capture, wrap the desired pattern in parenthesis:

/([A-Z])/.match("babySloth")
=> #<MatchData "S" 1:"S">

See how the result has saved the letter S to position 1? Now we can recall that captured value as many times as needed by referring to its capture position. This is called a backreference. We can put the backreference in the replacement portion of a gsub, along with our underscore:

("babySloth").gsub(/([A-Z])/, "_\1").downcase
=> "baby_sloth"

But what if we don’t know how many humps our camel has? Because each capture can only be tied to one match, we are extremely reliant on the string’s format when using captures and backreferences. We’ll need to capture more information to make sure we insert the underscores in the right position. Notice how multiple captures result in multiple backreferences:

/([A-Z])([a-z]+)([A-Z])/.match("iLoveSloths")
=> #<MatchData "LoveS" 1:"L" 2:"ove" 3:"S">

Using these backreferences, this is how it looks when we add in our underscores:

"iLoveSloths".gsub(/([A-Z])([a-z]+)([A-Z])/, "_\1\2_\3").downcase
=> "i_love_sloths"

That got ugly quickly, and it’s still not very dynamic. Using the string method scan, we have the flexibility to deal with any crazy incoming camel case string you can think of without having to keep track of captures and backreferences. Now we can change as many matched instances as the string dictates:

string = "iLoveBabySloths"
string.scan(/[A-Z]/) do |capital|
string = string.sub(/[A-Z]/, "_#{capital.downcase}")
end

string
=> "i_love_baby_sloths"

This last example can be done in other ways, too. We could make use of a capture within the block. Or we could use each_with_index to examine individual characters in relation to the characters around them. Let me know your suggestions!