RegExes in Ruby — A Brief Summary

This is a brief overview of the Ruby class Regexp. However, regexes other languages (JavaScript, Python, etc) are similar.
Regex, RegEx, or Regexp
/ˈɹɛgɛks/
(noun, verb)
A shortening of “Regular Expression,” regexes are used to find patterns in strings. In Ruby documentation, these are called regexp, and calling .class on a regex will return “Regexp.” A Ruby Regexp class holds a regular expression.
Chances are we’ve all seen a regular expression. Maybe we didn’t know. They can look friendly, like /[aeiou]/, and they can look more ominous, like /\/{2}(\w*\.(com|org))\/ /— which looks neither regular nor like an expression. After this short summary of regexes and their uses in Ruby, we will be able to read these much more easily, and maybe even start making our own! What do you think /\/{2}(\w+\.(com|org))\// is trying to match?
There are 3 things we need to recognize to start understanding regexes.
- Syntax
They start and end with /slashes/ - Literal Characters
Most characters are simple. /a/ matches “a” - Special Characters
Typing + ? . * ^ $ ( ) [ ] { } | or \ does not match any of these characters. /$/ does not match “$”
What if we need to match the special characters?
To tell the regex to accept a special character (#3 above) as a literal (#2), we use the “escape character,” a backslash (\). Our ominous example is already getting nicer. Removing the start/end slashes and the escape backslashes, we can see that we are looking for something that begins with {2} regular slashes (/), then whatever (\w+\.(com|org)) means, and it also ends in a slash. We must type \ first in order for the regex to recognize / as a character to identify, otherwise it will signal the end of our regex.
Moving ahead quickly (we’ll go into depth below), \w represents any alphanumeric character (we can think of “w” as “word”), and the + tells us that we want the \w between 1 and ∞ times. Lastly, we escape the special character . to show we actually want to match it, and the . can only be followed by “com” or “org.” Did you guess right? The ominous example is looking for //any_numbers_or_letters_or_underscores.com/ or .org/ — any domain name, as opposed to the characters before or after.
Below is more information on special characters. You can play with regexes to learn how they’re interpreted at regexr.com and rubular.com
More importantly, what are regexes for?
Matching

The methods .match, .scan, and =~ can be used to match a string and a pattern (which is inside a regex). This can be very helpful for things like validation.
Here, card.match(pattern) also works. If .match does not find a pattern, “nil” is returned.
The return value for =~, 4, is the index number where the pattern was first found. That is, the method =~ returns the index where the pattern was first found, or “nil” if not found. dash_numbers =~ card works the same as card =~ dash_numbers.
To get simple true or false answers, we can use .match? and !!(regex =~ string) (or !!(string =~ regex).
Changing

The methods .sub and .gsub can replace patterns in strings — .sub replaces the first occurance, and .gsub replaces all occurances. Like some other methods in Ruby, adding ! to the end of .sub! and .gsub! causes them to permanently modify the original strings, as opposed to just returning new strings.
In the examples above, we replaced the first lower-case vowel (.sub) and then all lower-case vowels (.gsub) with “#.”
Regexes can also be used with .split and .scan, but the regex must be in the parentheses and not before the periods, or it will return a “No Method” error.
Assigning Variables!

Of course, we can assign variables as usual, like with every method.
However, regexes also allow us to assign variables from within the regex. I’m nerding out right now! This is exciting!
That easily, Ruby scanned the text and withdrew 1 or more (+) digits (\d) in a row, and we can examine that data like we would from an array.
But there’s more! Similar to a hash, we can assign names to the things we’ve matched. For this, we use the syntax /(?<key>regex)/ or /(?’key’regex)/. Ruby refers to these as names within capture groups, but it might be easier to think of them as key/value pairs. Here, “example” is of the MatchData class.
In the example above, regex[:name] will be assigned to the first instance of a capital letter [A-Z] followed by 2 or more word characters \w, and regex[“age”] will be set to the first digit or consecutive digits. Then, we can get data from the MatchData instance “example” like a hash (and like an array, as previously mentioned).
Lastly, we can get some interesting information from the same MatchData using the $.
These combinations with $ can give you the everything before the first match ($`), everything including and between the matches ($&), everything after the last match ($’), the first match ($1), and the second match ($2).
How else can regexes be useful?
- Checking email formats
/.+@.+\..+/ matches any character 1 or more times, followed by @, followed by any character 1 or more times, followed by a period, followed by any character 1 or more times - Ensuring names are typed correctly
/([A-Z].+){2,}/ matches any capital character followed by 1 or more of any character, and all of that 2 or more times - Checking price data
/\$(?<dollars>\d+)\.(?<cents>\d+)/ first matches any digits after the $ sign and assigns them to [:dollars], and then matches any digits after the period and assigns them to [:cents] - Making password regulations
/^(?=.*[A-Z])(?=.*[a-z])(?=.*[0–9]).{6,12}$/ between the beginning (^) and the end ($), there must be a capital letter, a lower case letter, and a digit. All of these can come in any order (?=), and the whole thing must be 6 to 12 characters long - Searching in Atom and other text editors
i*’*m* — with “case insensitive” on, Atom would find any instance of 0-∞ i or I, 0-∞ ‘, and 0-∞ m or M. Keep in mind, this would also pick up any words with i, I, ‘, m, or M in them, but it would at least help you find a very badly mistyped “I’m.”

Actually, we can put this “case insensitivity” into our regexes outside of Atom, as well. Options come after the regex’s closing and modify it. The most common ones are
- /Ruby/i
ignores case, so ruby, Ruby, and RuBy all match. - /Ruby/x
ignores whitespace (\s) - /Ruby/m
matches multiple lines, and [enter] is just a character (\n)
There are tons of cool things you can do with regexes — like even checking that all of the letters in a string are from the Latin alphabet (/^\p{Latin}+$/). There are so many things you can do that many books have been written solely dedicated to regular expressions. Hopefully this short summary has helped us become comfortable enough with the idea to understand some common regex patterns in code.
What can regexes help you do?
Potential Problems
- Sometimes we see /[A-Za-z0–9]/, which means any character in A-Z, a-z, or 0–9. This is because computer languages typically use the UTF-8 encoding to evaluate individual characters, where 0 is 0030, 9 is 0039, A is 0041, Z is 005a, a is 0061, and z is 007a. The regex /[0-z]/ would include all of those letters AND :, ;, <, =, >, ?, @ — the letters between 9 and A — AND [, \, ], ^, _, and ` — the letters between Z and a. Also, putting symbols in the wrong order throws a syntax error, because /[z-0]/ is equivalent to the range 007a to 0030, counting backwards. As in everyday speech and standard writing, ranges are defined like [1–9] or “one through nine,” and not [9–1] or “nine through one.”