WEBVTT

0
00:00.110 --> 00:05.580
Let me start off this regular expression discussion with a famous quote from Jamie Zawinski.

1
00:05.600 --> 00:07.640
He's just a famous programmer, by the way.

2
00:07.970 --> 00:15.230
He said "Some people, when confronted with a problem, think 'I know I'll use regular expressions'. But

3
00:15.230 --> 00:16.670
now they have two problems."

4
00:17.690 --> 00:18.230
Haha ðŸ¤£.

5
00:18.710 --> 00:24.770
Oh, I guess what this is trying to say is that regular expressions are both terribly awkward and extremely

6
00:24.770 --> 00:27.310
useful at exactly the same time.

7
00:27.320 --> 00:33.290
They are awkward because their syntax is kind of cryptic and the programming interface JavaScript provides

8
00:33.290 --> 00:36.010
us is kind of a little bit clumsy, but they're useful.

9
00:36.020 --> 00:40.310
They are a powerful tool for inspecting and processing strings.

10
00:40.310 --> 00:46.010
So properly understanding regular expressions will make you a more effective programmer.

11
00:46.010 --> 00:51.560
And that's why this course wouldn't be complete if I didn't just mention it, at least at a high level.

12
00:51.570 --> 00:54.420
So what are they and how do they work?

13
00:54.440 --> 01:00.240
Well, firstly, a regular expression is just a way to create and read patterns. At the crux,

14
01:00.240 --> 01:03.270
that's all a regular expression is.

15
01:03.300 --> 01:10.650
And regex forms part of the standard library of many programming languages, including Java and Python.

16
01:10.650 --> 01:15.060
And it's built into the syntax of others, including Perl and ECMAScript.

17
01:15.060 --> 01:20.870
But unfortunately, regular expression syntax varies slightly across languages.

18
01:20.880 --> 01:24.960
The good news, though, is that the core details are the same.

19
01:24.960 --> 01:29.970
Now let's take a look at how each of these different languages match and extract data using regular

20
01:29.970 --> 01:30.930
expressions.

21
01:30.930 --> 01:37.540
In Java, the String class has a method called matches and it'll return true if it does match.

22
01:37.560 --> 01:44.670
Java also has a few helper classes, namely the Java.util.regex.Pattern helper, which helps to create

23
01:44.670 --> 01:45.510
the patterns.

24
01:45.510 --> 01:51.450
And they've got a matcher function which is a helper for navigating and keeping the state of various

25
01:51.450 --> 01:52.220
matches.

26
01:52.230 --> 01:57.900
Python provides regular expressions to, through the re module and provides a way to avoid the double

27
01:57.900 --> 02:05.850
escaping by prefixing your string with an R for "raw". Ruby allows you to create a regular expression object

28
02:05.850 --> 02:08.850
by surrounding a pattern with forward slashes.

29
02:09.030 --> 02:12.180
This is known as a regular expression literal,

30
02:12.210 --> 02:17.910
by the way. PHP has a function called preg_match and despite what the name sounds like, it's not

31
02:17.910 --> 02:25.380
a paternity test result function ðŸ¤£!  But, especially if you're anything like me, JavaScript is what we

32
02:25.380 --> 02:27.120
really, really care about.

33
02:27.120 --> 02:34.050
And in JavaScript a regular expression is a type of object. And we can create a regular expression in

34
02:34.050 --> 02:35.430
a few ways.

35
02:35.460 --> 02:41.910
It can either be constructed with the regexp constructor or it can be written as a literal value by

36
02:41.910 --> 02:47.760
enclosing a pattern in forward slash characters, the same as in Ruby that we just saw.

37
02:47.880 --> 02:51.300
But let's look at this in a bit more detail and see how it works.

38
02:51.720 --> 02:54.090
Okay, well, let's see both of these methods in more detail.

39
02:54.090 --> 02:55.530
Let's see them in action.

40
02:55.530 --> 02:56.280
So there we go.

41
02:56.310 --> 02:59.490
The first variable we define here is "regex1".

42
02:59.520 --> 03:05.870
We've used the constructor, the RegExp() constructor. And of course in the second case we've literally

43
03:05.870 --> 03:08.060
just used the literal approach.

44
03:08.060 --> 03:15.170
But very importantly, my dear students, both of these regular expression objects represent the same

45
03:15.170 --> 03:21.950
pattern - an "a" character followed by a "b" character, followed by a "c" character.

46
03:21.950 --> 03:24.530
Here we are defining a pattern.

47
03:24.800 --> 03:25.550
Got it.

48
03:25.550 --> 03:26.510
Okay, cool.

49
03:26.510 --> 03:31.610
You might be thinking we've created a pattern and this pattern is an object.

50
03:31.610 --> 03:33.020
It's a regular expression object.

51
03:33.020 --> 03:34.580
But why does it need to be in an object?

52
03:34.580 --> 03:36.050
What's the benefit of that?

53
03:36.080 --> 03:42.680
Well, the main benefit of having it as an object is that with objects we get a number of useful methods.

54
03:42.800 --> 03:48.140
And the simplest, and one of the most useful methods, is test().

55
03:48.530 --> 03:52.370
And this test() is a method on the object provided to us by JavaScript.

56
03:52.370 --> 03:54.590
It's not something we've written ourselves.

57
03:54.620 --> 04:00.470
And if you pass this method a string, it's going to return a boolean telling you whether the string

58
04:00.470 --> 04:04.680
contains the match of the pattern in your regular expression.

59
04:04.920 --> 04:05.280
Okay.

60
04:05.280 --> 04:05.850
Okay.

61
04:05.850 --> 04:07.710
I know ... it's just words on a screen.

62
04:07.710 --> 04:11.520
Let me jump over to Visual Studio Code and show you an example.

63
04:11.970 --> 04:14.610
Actions speak louder than a thousand words.

64
04:14.610 --> 04:15.390
So here we are.

65
04:15.390 --> 04:17.670
I've just got a JavaScript file open.

66
04:17.670 --> 04:20.460
I just happened to call it app.js.

67
04:20.490 --> 04:23.310
Now, let me just write a comment here.

68
04:23.310 --> 04:33.120
Let's say the user inputs data that we're saving into a variable called data.

69
04:33.240 --> 04:33.530
Okay?

70
04:33.540 --> 04:38.850
So let's just pretend this is a real life scenario, and we get the data from the form.

71
04:38.850 --> 04:44.820
Okay, here we're just going to have a whole lot of characters and then "abc" right,

72
04:44.820 --> 04:47.790
and then we can have a whole lot of, you know, x's following that.

73
04:47.790 --> 04:48.690
So there we go.

74
04:48.690 --> 04:53.400
That's data that the user's typed in whatever form widget we're dealing with.

75
04:53.520 --> 04:55.890
And now I want to show you regex.

76
04:55.890 --> 04:58.350
So let's start off with method 1. 

77
04:58.830 --> 04:59.730
And this can

78
04:59.780 --> 05:03.230
be using the constructor function.

79
05:05.800 --> 05:06.400
Very simple.

80
05:06.400 --> 05:08.290
And we've seen already how to do this.

81
05:08.320 --> 05:11.150
We're just going to define a variable called "regex1".

82
05:11.170 --> 05:17.680
We're going to use the new keyword in JavaScript and they have a constructor function called RegExp(),

83
05:18.190 --> 05:19.820
a regular expression.

84
05:19.840 --> 05:22.900
And what is it that we're going to create?

85
05:22.930 --> 05:24.670
What pattern do we want to create?

86
05:24.670 --> 05:30.340
The pattern we want to create is the character "a" followed by the character "b" followed by the character

87
05:30.370 --> 05:37.240
"c". Now what I want to do is I want to test whether our pattern is matched within the data variable,

88
05:37.240 --> 05:39.970
whatever the user typed in the form or the widget.

89
05:40.480 --> 05:41.530
How can we do that?

90
05:41.530 --> 05:46.930
Well, all we need to do is call that test() function that I told you about in the lecture, and then console

91
05:46.930 --> 05:52.450
log it out. And I don't want to go to the browser, so I want to use Quokka.

92
05:52.840 --> 05:54.670
So I'm going to start Quokka.

93
05:54.940 --> 05:55.890
That's how you spell it.

94
05:55.900 --> 05:58.030
Quokka on the current file.

95
05:58.030 --> 06:01.780
And if you've done my JavaScript courses, you'll know a lot about Quokka.

96
06:01.840 --> 06:05.390
It's basically just a runtime JavaScript interpreter.

97
06:05.390 --> 06:10.790
Okay, so we can see the results of JavaScript in real time -  very, very handy and saves a lot of time.

98
06:10.790 --> 06:17.810
So here we just want to console log out the result and here we're going to call our variable.

99
06:17.810 --> 06:20.780
And remember how I said that there's lots of properties and methods.

100
06:20.780 --> 06:30.230
The one that we want is test(). And we are wanting to test whether our data matches the A, B, C pattern.

101
06:30.440 --> 06:33.020
And here you can see that we're being logged out,

102
06:33.050 --> 06:33.620
true âœ….

103
06:33.620 --> 06:34.970
So it passes the test.

104
06:34.970 --> 06:39.440
If, for example, we didn't have an "a", it's going to be false.

105
06:40.130 --> 06:42.920
So you can see how easy it is to work with regex.

106
06:42.920 --> 06:46.550
And method two is just as easy.

107
06:46.550 --> 06:49.280
We're going to be using the literal approach.

108
06:50.470 --> 06:53.080
Let's define a new variable called "regex2". 

109
06:53.290 --> 06:55.870
Here, we just need to use forward slashes, right?

110
06:55.870 --> 06:59.590
We don't need that constructor function.

111
06:59.600 --> 07:01.210
It's just a shortcut, if you will.

112
07:01.210 --> 07:06.400
And again, we're going to define the pattern as "a" followed by the character "b", followed by the character

113
07:06.430 --> 07:13.870
"c", and we can do exactly the same thing. We can console log, we can call our regular expression. On there, we have

114
07:13.870 --> 07:19.270
a method called test() and we're wanting to test the data. And there we go.

115
07:19.270 --> 07:20.740
We literally have true âœ….

116
07:22.760 --> 07:24.410
It's passed the test.

117
07:24.710 --> 07:29.000
If we put the "b" in front of the "a", of course it's all going to be false.

118
07:29.000 --> 07:31.940
It has to be A followed by B, followed by C.

119
07:32.390 --> 07:33.980
Is this starting to make more sense?

120
07:34.310 --> 07:35.060
I hope so.

121
07:35.210 --> 07:40.430
But you may be thinking, "why did we need to use a regex in this example?

122
07:40.430 --> 07:40.910
Clyde?"

123
07:40.940 --> 07:44.930
Couldn't we have just used JavaScript's indexOf() method, for example?

124
07:45.350 --> 07:46.400
Let me show you what I mean.

125
07:46.400 --> 07:50.150
So method 3:  using indexOf.

126
07:51.540 --> 07:52.950
What is indexOf()?

127
07:52.980 --> 07:53.780
Don't stress.

128
07:53.790 --> 07:57.700
It's just a prototype on the String object in JavaScript.

129
07:57.720 --> 08:03.690
What's important to us is that it will return a -1 if the value is not found within the object.

130
08:03.720 --> 08:04.820
Let me show you what I mean.

131
08:04.830 --> 08:06.810
So all I want to do is console log.

132
08:06.810 --> 08:10.470
I want to grab our string, which is data.

133
08:10.470 --> 08:19.620
We've defined it above. On this string, we have a method called indexOf() and this is provided to us by

134
08:19.620 --> 08:29.160
JavaScript. And we are wanting to test the string "abc" and here we return the number 7 because

135
08:29.160 --> 08:34.360
it's telling us where in that sequence of characters that pattern starts at.

136
08:34.380 --> 08:41.010
In other words, we've got 7 x's before the "abc". Of course, if we don't have an A, everything

137
08:41.010 --> 08:44.950
is going to return false, and the indexOf() method is going to return -1.

138
08:45.240 --> 08:49.950
I know we've covered it a lot in a very short space of time, but let's jump back to the lecture.

139
08:50.460 --> 08:57.160
Finding out whether a string contains "abc" could just as well have been done with a call to

140
08:57.160 --> 08:57.850
indexOf.

141
08:57.850 --> 08:59.440
But ... drumroll ...

142
09:00.760 --> 09:06.820
regular expressions allow us to express more complicated patterns than just the one we looked at

143
09:06.850 --> 09:13.990
now. We saw when we were looking at MailChimp, we've got uppercase, lowercase, we've got numbers, minimum

144
09:13.990 --> 09:14.950
of eight characters,

145
09:14.950 --> 09:15.460
etc

146
09:15.460 --> 09:15.940
etc. 

147
09:15.940 --> 09:20.740
So with regex, we can really, really define the exact requirements we need.

148
09:20.770 --> 09:23.230
And we're going to be seeing more examples of this throughout the course.

149
09:23.230 --> 09:28.130
But for now, I just wanted to stop and give you a high level overview of what

150
09:28.610 --> 09:29.450
a regular expression is.

151
09:29.450 --> 09:33.350
Hopefully it's helped you, and I'll see you in the next lecture.