RegExes in Ruby — A Brief Summary

Me when I see a RegEx

This is a brief overview of the Ruby class Regexp. However, regexes other languages (JavaScript, Python, etc) are similar.

There are 3 things we need to recognize to start understanding regexes.

  1. Syntax
    They start and end with /slashes/
  2. Literal Characters
    Most characters are simple. /a/ matches “a”
  3. Special Characters
    Typing + ? . * ^ $ ( ) [ ] { } | or \ does not match any of these characters. /$/ does not match “$”

What if we need to match the special characters?

To tell the regex to accept a special character (#3 above) as a literal (#2), we use the “escape character,” a backslash (\). Our ominous example is already getting nicer. Removing the start/end slashes and the escape backslashes, we can see that we are looking for something that begins with {2} regular slashes (/), then whatever (\w+\.(com|org)) means, and it also ends in a slash. We must type \ first in order for the regex to recognize / as a character to identify, otherwise it will signal the end of our regex.

Matching

We want {4} digits \d followed by -. All of that {3} times, ending with {4} digits \d.
dash_numbers looks for a dash followed by {4} digits \d
.match? is trying to find {16} digits \d in a row, and will just return “true” or “false”

Changing

chamelion.gsub(‘green’, new_color)
In both lines 3 and 5, we are just trying to match any lower-case vowels

Assigning Variables!

This is quite exciting
.scan returns every chunk of {4} digits \d

How else can regexes be useful?

  • Checking email formats
    /.+@.+\..+/ matches any character 1 or more times, followed by @, followed by any character 1 or more times, followed by a period, followed by any character 1 or more times
  • Ensuring names are typed correctly
    /([A-Z].+){2,}/ matches any capital character followed by 1 or more of any character, and all of that 2 or more times
  • Checking price data
    /\$(?<dollars>\d+)\.(?<cents>\d+)/ first matches any digits after the $ sign and assigns them to [:dollars], and then matches any digits after the period and assigns them to [:cents]
  • Making password regulations
    /^(?=.*[A-Z])(?=.*[a-z])(?=.*[0–9]).{6,12}$/ between the beginning (^) and the end ($), there must be a capital letter, a lower case letter, and a digit. All of these can come in any order (?=), and the whole thing must be 6 to 12 characters long
  • Searching in Atom and other text editors
    i*’*m* — with “case insensitive” on, Atom would find any instance of 0-∞ i or I, 0-∞ ‘, and 0-∞ m or M. Keep in mind, this would also pick up any words with i, I, ‘, m, or M in them, but it would at least help you find a very badly mistyped “I’m.”
  • /Ruby/x
    ignores whitespace (\s)
  • /Ruby/m
    matches multiple lines, and [enter] is just a character (\n)

What can regexes help you do?

Potential Problems

  • Sometimes we see /[A-Za-z0–9]/, which means any character in A-Z, a-z, or 0–9. This is because computer languages typically use the UTF-8 encoding to evaluate individual characters, where 0 is 0030, 9 is 0039, A is 0041, Z is 005a, a is 0061, and z is 007a. The regex /[0-z]/ would include all of those letters AND :, ;, <, =, >, ?, @ — the letters between 9 and A — AND [, \, ], ^, _, and ` — the letters between Z and a. Also, putting symbols in the wrong order throws a syntax error, because /[z-0]/ is equivalent to the range 007a to 0030, counting backwards. As in everyday speech and standard writing, ranges are defined like [1–9] or “one through nine,” and not [9–1] or “nine through one.”

Resources

Full Stack Web Developer, former English as a foreign language teacher and volleyball coach. All views and opinions are my own.

Get the Medium app