Updated: Jun 03, 2019
When I first started out as an SEO, I had never heard of a Regular Expression. I’ve since learned that, while intimidating at first, RegEx are an invaluable tool and a little understanding of how they work goes a long way. What follows is a quick guide for beginners to start to understand Regular Expressions and how they can be used in day-to-day search engine optimization work.
The Challenge
I was working with a company called Bryers’ Widgets (names have been changed to protect the innocent). As part of my keyword research, I wanted to see what non-branded (discovery) keywords were generating impressions and clicks for the site. Using Google Analytics I viewed the list of queries that generated impressions in the past 12 months.
This was a good start, but the list showed ALL keywords, both branded and non-branded. I needed to filter out any queries that contained the client’s name “Bryers’ Widgets”. Easy enough, right? I clicked on the advanced filter and chose “Exclude Query Containing Bryers’” as seen here:
However, when I reviewed the results, a problem became very evident. People were misspelling the company name in at least 10 different ways:
Briers’ | Breyers’ |
Briar’s | Bryers |
Buyers | Bryars’ |
Breyars | Breyar’s |
Breir’s | Bryer |
Now, I could have kept adding filters for each misspelling (after all, I needed to exclude all branded terms whether spelled correctly or not), but in the end I would needed a minimum of 10 separate filter—a laborious task at best:
The Solution: A Simple Regular Expression
Rather than filtering each query individually, it’s possible to use one Regular Expression to accomplish the same task!
So What Is a RegEx?
A Regular Expression is a way of searching for a specified pattern in text using strings of characters that includes variables just like the example above. Regular expressions use characters and meta characters to accomplish this.
A character is any letter, number, or symbol (or group of letters, numbers, or symbols) that you would like to match. In this case, I want to match all of the different ways in which people have spelled “Bryers’.”
A meta character is a special set of characters that you will use to define what you’re looking for. There are a ton of meta characters that you can use. A great quick reference of meta characters used in RegEx can be found here: https://www.regular-expressions.info/refcharacters.html.
For this example, we will use only 3 of the most basic meta characters to accomplish our goal:
- () – Parentheses indicate a subexpression. Anything within the parentheses is seen as a group. (EXAMPLE)
- | – A pipe represents “Or.” So (B|b)ryers’ would match Bryers’ OR bryers’.
- ? – A question mark means that the previous character occurs 1 or 0 times. So the expression: Bryers? would match both Bryers and Bryer.
Note:One key meta character that we will not use but is important to know is the \. You may have asked yourself, “What if I need to match a question mark? Because it’s a meta character, won’t it simply modify the previous character?” The answer is yes. BUT a \ exempts the following character from being meta. So /? would be seen simply as a question mark and not as a meta character.
Building a Simple RegEx
Looking back at our list of keywords, we can take this piece by piece:
Briers’ | Breyers’ |
Briar’s | Bryers |
Buyers | Bryars’ |
Breyars | Breyar’s |
Breir’s | Bryer |
Luckily every word starts off with a b! But, as mentioned in our examples, that b may or may not be capitalized. So again, we’ll use the parentheses to group them together and add the pipe meta character to indicate that either one OR the other is acceptable:
(B|b)
The third letter is a similar situation. But in this case, some people use the r, others have added a u, and still others have used neither and moved straight to the y. So we need to account for all 3 cases. Once again, we’ll use the parentheses and pipe combo to capture the r and the u, but we’ll add the question mark to indicate that this character may or may not occur at all:
(B|b)(r|u)?
Next, I see there is either an i or an e but sometimes neither of these:
(B|b)(r|u)?(i|e)?
Then some people have added a y, but not all of them, so we’ll use the question mark again:
(B|b)(r|u)?(i|e)?y?
Then we have either an e or an a:
(B|b)(r|u)?(i|e)?y?(e|a)
Then a bloody miracle occurs! Everyone remembers the r:
Then we have to deal with the apostrophe. Everyone has used an s at the end, but some put the apostrophe before the s. Some put it after. Some don’t use it at all. Again, we’ll go back to the trusty question mark to indicate that the previous character may occur 0 or 1 times:
(B|b)(r|u)?(i|e)?y?(e|a)r’?s’?
And there you have it! We have built a simple RegEx that will capture all 10 versions of the keyword with one simple line. Now I can go back into Google Analytics, and instead of creating 10 filters each using “Exclude Query Containing”, I can create one simple filter using “Exclude Query Matching RegExp” as seen here:
The above filter will leave us with nothing but non-branded queries for our keyword research.
A Final Word on RegEx:
Hopefully you now have a basic understanding of what a RegEx is, how it works, and an example of how it might be used. As I mentioned in the beginning, Regular Expressions can be far more complex than this example, and there may even be simpler ways to accomplish the above using more complex language. That’s what makes RegEx so incredibly useful! With some study, you’ll be able to build a RegEx to match any query string imaginable. This tool also comes in handy when doing redirects, canonicals, and much more, but I’ll show you that in another post.
More Resources: