• Design
  • Shotty
  • Blog
  • Reading
  • Photos
Menu

Jacob Ruiz

Product Designer
  • Design
  • Shotty
  • Blog
  • Reading
  • Photos
regex-lookaheads-and-backreferences@2x.png

Mastering Javascript Fundamentals: Lookaheads and backreferences

May 22, 2018

Get the fundamentals down and the level of everything you do will rise. - Michael Jordan

As stated in my original post, I do 1 hour of video lessons from Watch and Code every day. If you're interested in learning Javascript in a way that goes beyond basic tutorials and gives you a foundational, practical knowledge without relying on frameworks - I'd highly recommend it. If you're reading these posts, please keep in mind that these are just my notes, and I'm not an expert (yet!). If your goal is also to master the fundamentals of Javascript, please head over to Watch and Code and start your journey there!

All screenshots were annotated using Shotty.


Lookaheads and backreferences

Match "w" only if what comes after "w" is "w":

/w(?=w)/g

This is the syntax for a "positive lookahead".

Here it is shown in RegExr:

Screen Shot 2018-05-23 at 10.17.21 AM.png
Screen Shot 2018-05-23 at 10.28.21 AM.png

The ?=w piece won't be included in the result, it is only there to specify this condition.

We can run some more examples:

Match "w" only if it's followed by an "h".

Only match "w" if it's followed by "oot".

Screen Shot 2018-05-23 at 10.31.44 AM.png

Looking back at our original example: "match every w except the last one".

Screen Shot 2018-05-23 at 10.33.19 AM.png

There's a pretty big problem with our current implementation: what if we have non-consecutive w's? Everything breaks.

Screen Shot 2018-05-23 at 10.35.29 AM.png

The reason is that our regular expression isn't allowing for characters to be in between the w's. 

We need a way to say that between one w and the next, we can see any character, any number of times.

Well to say "any character", we can use the meta character: .

And to say "any number of times" we can use the quantifier {0,}, which means "zero or more times".

So we can write:

/w(?=.{0,}w)/g

Here it is in RegExr:

Screen Shot 2018-05-23 at 10.57.30 AM.png

There's actually a meta character for "zero or more". 

We know that one or more is a plus sign, +.

Zero or more is a star, *.

So this:

/w(?=.{0,}w)/g

Is the same as this:

/w(?=.{*}w)/g
Screen Shot 2018-05-23 at 11.08.34 AM.png

We saw paratheses, (), before when we looked at capture groups. Capture groups allow us to refer to a portion of a match using $1, $2, etc.

It's important to note that these parentheses are totally different than (?=), which is what we use here for lookaheads.

So far we've been able to get all w's except the last one. What if we want to do the opposite?

What if we want only the last w? We can do that with a very small change. All we have to do is change the equals sign, =, to an exclamation or bang, !.

Screen Shot 2018-05-23 at 11.16.09 AM.png

The way to read this is, "match w if what follows is not this pattern (all characters, zero or more, followed by w)".

Now we know about two types of lookaheads:

Positive lookaheads: (?= )
Negative lookaheads: (?! )

Another quick example of negative lookaheads. Lets match grey only if it's not followed by " hound":

Screen Shot 2018-05-23 at 11.20.19 AM.png

If we want to do a positive lookahead, we switch the ! to a = and we will match grey only if it is followed by hound:

Screen Shot 2018-05-23 at 11.22.29 AM.png

What about a more generalized case of our original example?

What if we wanted the last instance of different characters?

Screen Shot 2018-05-23 at 12.09.34 PM.png

Our expression succeeds in getting the last w, but what if we wanted to succeed in also getting the last a?

Overall we'd like to extend this to get the last value of every letter. Grab the last b, the last a, the last c, etc.

To start, lets put the w in a capture group and refer to it inside of our lookahead. To do this, we use \1.

Screen Shot 2018-05-23 at 12.17.56 PM.png
Screen Shot 2018-05-23 at 12.19.08 PM.png

Now let's use the pipe to say "a or w".

Screen Shot 2018-05-23 at 12.21.44 PM.png

This gives us the last instance of a, and the last instance of w.

The way you want to read this is:

Match a or w if it's not followed by zero or more characters, followed by a or w. 

Because in order for an "a" to be the last "a", it must not be followed by any number of characters with an "a" at the end.

So this gives us a nice working example for a and b, but what about other letters. Well, we could just add them each by hand, separated by pipes:

Screen Shot 2018-05-23 at 12.29.35 PM.png

But this is obviously a tedious and error-prone approach. 

Instead, we can just use the dot (.) to represent "all characters", and it works exactly the same way:

Screen Shot 2018-05-23 at 12.30.33 PM.png

 

This regular expression matches the last instance of any character.

Let's look at another simple example:

Lets get a capture group and a back reference to refer to that capture group:

Screen Shot 2018-05-23 at 12.32.42 PM.png

Then let's put just the letter "r":

Screen Shot 2018-05-23 at 12.34.55 PM.png

This is exactly equivalent to "rr".

Screen Shot 2018-05-23 at 12.35.49 PM.png

We can add quantifiers to this. Imagine we want "r" followed by two "rr"s.

Screen Shot 2018-05-23 at 12.37.10 PM.png

We can match "rarr" by adding an "a" in the middle:

Screen Shot 2018-05-23 at 12.38.06 PM.png

Summary

  • Positive lookaheads: (?=)
  • Negative lookaheads: (?!)
  • Back references: ()/1
← Mastering Javascript Fundamentals: unformat, new RegExpMastering Javascript Fundamentals: Regular Expression Capture Groups →
shotty-skinny2x.jpg

Shotty - Faster Access To Your Screenshots on Mac

Shotty is an award-winning Mac app I created to give you instant access to all your recent screenshots, right from the menu bar. You can even add annotations on-the-fly. Stop wasting time digging through Finder for your screenshots. I promise it’ll change your workflow forever (just read the App Store reviews!).



Most popular

information-architecture

Information Architecture: The Most Important Part of Design You're Probably Overlooking

Follow @JacobRuizDesign