1
00:00:02,050 --> 00:00:04,760
So how can we now protect against

2
00:00:04,760 --> 00:00:07,440
cross-site scripting attacks?

3
00:00:07,440 --> 00:00:09,900
Well, to protect against those attacks

4
00:00:09,900 --> 00:00:12,810
we got two main building blocks

5
00:00:12,810 --> 00:00:14,790
that you should be aware of.

6
00:00:14,790 --> 00:00:17,280
The first very important building block

7
00:00:18,253 --> 00:00:22,493
is that you should try to output escaped user content.

8
00:00:23,710 --> 00:00:25,610
So if you're outputting user content

9
00:00:25,610 --> 00:00:27,500
somewhere on your page,

10
00:00:27,500 --> 00:00:30,550
you should not take the raw HTML code

11
00:00:30,550 --> 00:00:32,700
and parse it as such

12
00:00:32,700 --> 00:00:36,060
or hand it to the browser to parse it as such,

13
00:00:36,060 --> 00:00:38,290
instead you should escape it.

14
00:00:38,290 --> 00:00:42,520
And escaping the content simply means that

15
00:00:43,964 --> 00:00:47,770
certain features of HTML will be disabled

16
00:00:47,770 --> 00:00:50,500
or the entire HTML block

17
00:00:50,500 --> 00:00:55,030
will actually be treated as raw text instead of HTML.

18
00:00:55,030 --> 00:00:57,930
So you might have all the HTML tags in there,

19
00:00:57,930 --> 00:01:00,720
but instead of handing them to the browser

20
00:01:00,720 --> 00:01:02,630
for the browser to parse them,

21
00:01:02,630 --> 00:01:05,269
you might output them as plain text

22
00:01:05,269 --> 00:01:08,060
and not let the browser parse them.

23
00:01:08,060 --> 00:01:11,160
And then the script tag which I entered before

24
00:01:11,160 --> 00:01:14,120
would be output as plain text on the screen

25
00:01:14,120 --> 00:01:17,060
instead of being executed by the browser.

26
00:01:17,060 --> 00:01:19,480
I will show you how you can escape user input

27
00:01:19,480 --> 00:01:20,540
in just a second,

28
00:01:20,540 --> 00:01:23,020
but that's the first building block.

29
00:01:23,020 --> 00:01:24,810
The second building block,

30
00:01:24,810 --> 00:01:28,460
which would be an alternative is that you sanitize,

31
00:01:28,460 --> 00:01:30,360
that you clean the user input

32
00:01:30,360 --> 00:01:32,563
before you process or store it.

33
00:01:33,540 --> 00:01:35,880
Now, if you do escape the output,

34
00:01:35,880 --> 00:01:37,400
which I'll show you in a second,

35
00:01:37,400 --> 00:01:39,810
you don't need to sanitize and clean

36
00:01:39,810 --> 00:01:43,320
because escaping is enough protection.

37
00:01:43,320 --> 00:01:46,920
If you do output raw and escaped content though,

38
00:01:46,920 --> 00:01:49,950
then you should really try to sanitize it first

39
00:01:49,950 --> 00:01:51,530
so that you do ensure

40
00:01:51,530 --> 00:01:54,500
that you don't work with raw user input

41
00:01:54,500 --> 00:01:56,810
but instead it was cleaned first.

42
00:01:56,810 --> 00:01:59,420
Because sanitizing means that you take a look

43
00:01:59,420 --> 00:02:01,490
at that user-generated content

44
00:02:01,490 --> 00:02:05,180
and you look for a certain suspicious things

45
00:02:05,180 --> 00:02:06,640
like script tags

46
00:02:06,640 --> 00:02:09,780
and you then remove those script tags

47
00:02:09,780 --> 00:02:11,650
or you convert them to something

48
00:02:11,650 --> 00:02:13,750
that can't do any harm.

49
00:02:13,750 --> 00:02:16,040
So sanitizing is important

50
00:02:16,040 --> 00:02:19,820
if you plan to output unescaped user input.

51
00:02:19,820 --> 00:02:22,630
If you do escape, you don't need it.

52
00:02:22,630 --> 00:02:24,350
But we're going to take a look at both

53
00:02:24,350 --> 00:02:26,290
and they offer let's start by taking a look at

54
00:02:26,290 --> 00:02:31,290
escaping user input when we output it on a page.

55
00:02:31,300 --> 00:02:33,380
But let's start with the first one

56
00:02:33,380 --> 00:02:36,550
with escaping the content

57
00:02:36,550 --> 00:02:39,360
so that instead of injecting a script like this

58
00:02:39,360 --> 00:02:41,040
into the rendered template,

59
00:02:41,040 --> 00:02:45,100
we actually just output it as plain text.

60
00:02:45,100 --> 00:02:47,430
And that is possible.

61
00:02:47,430 --> 00:02:51,000
Back here on the backend where we have this

62
00:02:51,000 --> 00:02:53,810
injected content in the database right now,

63
00:02:53,810 --> 00:02:57,500
we can go to the discussion EJS file.

64
00:02:57,500 --> 00:02:59,790
And here there is something wrong

65
00:02:59,790 --> 00:03:03,740
with how I'm outputting this comment text.

66
00:03:03,740 --> 00:03:06,637
I'm using this EJS tag with a dash here

67
00:03:06,637 --> 00:03:10,020
after the EJS opening tag.

68
00:03:10,020 --> 00:03:11,240
Now it's been some time

69
00:03:11,240 --> 00:03:14,840
since I introduced you to the EJS templating language,

70
00:03:14,840 --> 00:03:18,587
but you might recall that we typically output text

71
00:03:18,587 --> 00:03:23,460
and so on with the equal sign instead of the dash here.

72
00:03:23,460 --> 00:03:25,180
The reason for that is that

73
00:03:25,180 --> 00:03:29,470
when you use this EJS tag with the equal sign,

74
00:03:29,470 --> 00:03:33,100
the content that will be output will be escaped,

75
00:03:33,100 --> 00:03:36,160
it will be treated as plain text in the end.

76
00:03:36,160 --> 00:03:38,190
And therefore you should always use

77
00:03:38,190 --> 00:03:40,320
this equal sign instead of the dash

78
00:03:40,320 --> 00:03:43,020
when outputting user-generated content

79
00:03:43,020 --> 00:03:45,670
unless you really, really, really know

80
00:03:45,670 --> 00:03:50,260
that you must parse it as actual HTML content.

81
00:03:50,260 --> 00:03:51,093
And in that case,

82
00:03:51,093 --> 00:03:54,430
you should have made sure that it was sanitized first.

83
00:03:54,430 --> 00:03:57,840
But typically you wanna use this tag with the equal sign

84
00:03:57,840 --> 00:03:59,650
and that is the tag we used

85
00:03:59,650 --> 00:04:02,040
throughout this course therefore.

86
00:04:02,040 --> 00:04:04,500
The tag with the dash can for example,

87
00:04:04,500 --> 00:04:07,140
be seen when we use the include function

88
00:04:07,140 --> 00:04:08,730
because there indeed

89
00:04:08,730 --> 00:04:12,030
I'm about to include a bunch of HTML code

90
00:04:12,030 --> 00:04:14,290
that's defined in a different file

91
00:04:14,290 --> 00:04:15,790
and it should be parsed

92
00:04:15,790 --> 00:04:18,930
and used as HTML here.

93
00:04:18,930 --> 00:04:21,350
But here I'm including HTML code,

94
00:04:21,350 --> 00:04:25,080
which I, the developer wrote, therefore it's safe,

95
00:04:25,080 --> 00:04:27,660
it's not user-generated content.

96
00:04:27,660 --> 00:04:29,480
User-generated content instead

97
00:04:29,480 --> 00:04:31,700
should typically be output like this,

98
00:04:31,700 --> 00:04:33,233
with the equal sign tag.

99
00:04:34,870 --> 00:04:38,100
So if you save this now and reload this page,

100
00:04:38,100 --> 00:04:39,990
now you see this as plain text

101
00:04:39,990 --> 00:04:42,490
and you didn't get that alert.

102
00:04:42,490 --> 00:04:45,620
And if we inspect this in the developer tools,

103
00:04:45,620 --> 00:04:49,350
we indeed see here it still looks like HTML,

104
00:04:49,350 --> 00:04:52,570
but it was parsed to the browser as plain text

105
00:04:52,570 --> 00:04:55,060
and not as HTML.

106
00:04:55,060 --> 00:04:56,830
So therefore actually here,

107
00:04:56,830 --> 00:04:59,630
the browser did not execute this script

108
00:04:59,630 --> 00:05:02,590
because to the browser it's just a regular text

109
00:05:02,590 --> 00:05:05,200
and hence now we have this injected code

110
00:05:05,200 --> 00:05:08,170
but it's not doing anything bad.

111
00:05:08,170 --> 00:05:09,110
And that therefore

112
00:05:09,110 --> 00:05:11,290
is something you should do.

113
00:05:11,290 --> 00:05:13,750
Always output user-generated content

114
00:05:13,750 --> 00:05:15,820
with that equal sign tag here

115
00:05:15,820 --> 00:05:19,270
or with any other techniques that escapes the content.

116
00:05:19,270 --> 00:05:21,840
If you're using a different templating engine,

117
00:05:21,840 --> 00:05:25,030
all those engines typically have escape tags

118
00:05:25,030 --> 00:05:26,880
and you should always escape

119
00:05:26,880 --> 00:05:28,620
that user-generated content

120
00:05:28,620 --> 00:05:30,113
before you output it.

121
00:05:31,180 --> 00:05:33,330
Now that was escaping

122
00:05:33,330 --> 00:05:36,390
before I mentioned that an alternative to that

123
00:05:36,390 --> 00:05:39,740
would be to sanitize the user input.

124
00:05:39,740 --> 00:05:41,520
Again, you don't need to do both

125
00:05:41,520 --> 00:05:44,510
and typically you shouldn't sanitize,

126
00:05:44,510 --> 00:05:47,850
you should instead prioritize escaping.

127
00:05:47,850 --> 00:05:51,040
But it is worth knowing about sanitizing as well

128
00:05:51,040 --> 00:05:52,960
especially if you have scenarios

129
00:05:52,960 --> 00:05:54,440
where you can't escape

130
00:05:54,440 --> 00:05:57,800
but where you need to output some raw HTML content

131
00:05:57,800 --> 00:06:01,260
and where you need to parse it as HTML.

132
00:06:01,260 --> 00:06:03,760
In such cases you might at least consider

133
00:06:03,760 --> 00:06:06,223
sanitizing it before you output it.

134
00:06:07,480 --> 00:06:09,070
Now sanitizing means cleaning

135
00:06:09,070 --> 00:06:10,280
as I mentioned before,

136
00:06:10,280 --> 00:06:12,430
it means that you take a look at that content

137
00:06:12,430 --> 00:06:16,520
and you get rid of any dangerous things in that content.

138
00:06:16,520 --> 00:06:18,830
And whilst you could write your own code,

139
00:06:18,830 --> 00:06:21,000
which takes a look at the user content

140
00:06:21,000 --> 00:06:22,640
and tries to clean it,

141
00:06:22,640 --> 00:06:24,660
very often it's more convenient

142
00:06:24,660 --> 00:06:27,403
to instead use third-party packages again.

143
00:06:28,240 --> 00:06:29,700
For express JS,

144
00:06:29,700 --> 00:06:33,270
if you search for express sanitize user input,

145
00:06:33,270 --> 00:06:35,080
you will find discussions

146
00:06:35,080 --> 00:06:37,650
that share different approaches that can be used

147
00:06:37,650 --> 00:06:40,560
and different packages that can be used.

148
00:06:40,560 --> 00:06:44,220
And you will also find packages themselves

149
00:06:44,220 --> 00:06:46,720
like the express-validator package,

150
00:06:46,720 --> 00:06:50,550
which can also be used for general input validation.

151
00:06:50,550 --> 00:06:51,670
So that's also a package

152
00:06:51,670 --> 00:06:53,600
you might wanna to take a look at.

153
00:06:53,600 --> 00:06:56,660
Now for cross-site scripting attacks specifically,

154
00:06:56,660 --> 00:06:59,780
you can also search for express-XSS.

155
00:06:59,780 --> 00:07:02,830
You will also find more information

156
00:07:02,830 --> 00:07:06,540
on how you can add cross-site scripting protection

157
00:07:06,540 --> 00:07:07,760
in your express app.

158
00:07:07,760 --> 00:07:10,840
And for example this XSS package here

159
00:07:10,840 --> 00:07:12,250
could be used for that.

160
00:07:13,100 --> 00:07:15,270
This is a package which you can install

161
00:07:15,270 --> 00:07:18,200
into any node JS project.

162
00:07:18,200 --> 00:07:22,490
And there you can use it to take some input, for example,

163
00:07:22,490 --> 00:07:23,340
some user input,

164
00:07:23,340 --> 00:07:25,950
which you received in one of your routes

165
00:07:25,950 --> 00:07:28,200
and let the package parse

166
00:07:28,200 --> 00:07:31,300
and convert it into a safe alternative.

167
00:07:31,300 --> 00:07:34,610
So this package will then strip certain content,

168
00:07:34,610 --> 00:07:37,450
like for example, such script text.

169
00:07:37,450 --> 00:07:39,490
Now to use this package in our project,

170
00:07:39,490 --> 00:07:40,830
we can stop the server

171
00:07:40,830 --> 00:07:43,320
and npm install xss,

172
00:07:43,320 --> 00:07:45,460
so install this package.

173
00:07:45,460 --> 00:07:47,860
And then in the routes, for example,

174
00:07:47,860 --> 00:07:51,400
we can import this package here

175
00:07:51,400 --> 00:07:54,620
by requiring it as we always do.

176
00:07:54,620 --> 00:07:56,260
And then in the post route

177
00:07:56,260 --> 00:07:57,870
where we receive content

178
00:07:57,870 --> 00:07:59,490
that might be malicious,

179
00:07:59,490 --> 00:08:02,430
so where we receive user-generated content

180
00:08:02,430 --> 00:08:04,990
that we plan on outputting on the page

181
00:08:04,990 --> 00:08:09,990
there we can actually wrap this with a function call.

182
00:08:10,350 --> 00:08:12,600
So we pass request body comment,

183
00:08:12,600 --> 00:08:14,380
queue this xss function

184
00:08:14,380 --> 00:08:17,160
and this function will then sanitize and clean it

185
00:08:17,160 --> 00:08:20,060
and return the clean value.

186
00:08:20,060 --> 00:08:22,990
And it's then the cleaned sanitized value

187
00:08:22,990 --> 00:08:25,433
that is stored here in the database.

188
00:08:27,640 --> 00:08:30,603
So if we now start this server again,

189
00:08:32,309 --> 00:08:34,559
and I go to the MongoDB shell,

190
00:08:34,559 --> 00:08:37,210
if I have a look at my comments,

191
00:08:37,210 --> 00:08:39,789
I see this one comment here,

192
00:08:39,789 --> 00:08:43,370
which we added before with the script in there.

193
00:08:43,370 --> 00:08:46,110
But if I now go back to my main page

194
00:08:46,110 --> 00:08:49,860
and I reload and try to do the same thing again,

195
00:08:49,860 --> 00:08:52,910
so I try to add another script

196
00:08:52,910 --> 00:08:55,600
where I say hacked again,

197
00:08:55,600 --> 00:08:57,513
you'll see a different result now.

198
00:08:58,480 --> 00:08:59,880
If I click send,

199
00:08:59,880 --> 00:09:01,800
we see a different output here.

200
00:09:01,800 --> 00:09:03,880
And we see a different output here

201
00:09:03,880 --> 00:09:06,330
because the user input

202
00:09:06,330 --> 00:09:10,490
was translated to this output here.

203
00:09:10,490 --> 00:09:14,030
The script tag was basically replaced

204
00:09:15,218 --> 00:09:19,460
with a non HTML version of that content.

205
00:09:20,750 --> 00:09:22,680
If we have a look at the database,

206
00:09:22,680 --> 00:09:25,860
we also see that this data was stored like that

207
00:09:25,860 --> 00:09:26,883
in the database.

208
00:09:27,870 --> 00:09:29,770
Now in case you're wondering what this

209
00:09:31,000 --> 00:09:32,830
andlt and andgt means,

210
00:09:32,830 --> 00:09:36,330
these are special character descriptions

211
00:09:36,330 --> 00:09:38,290
you could say in HTML

212
00:09:39,499 --> 00:09:43,590
where andlt; stands for the lower than sign

213
00:09:43,590 --> 00:09:48,190
and andgt; stands for the greater than sign.

214
00:09:48,190 --> 00:09:50,403
So that is how this was translated.

215
00:09:51,810 --> 00:09:54,180
If you would switch back to outputting it

216
00:09:54,180 --> 00:09:56,100
as raw HTML again,

217
00:09:56,100 --> 00:09:57,010
then if you reload

218
00:09:57,010 --> 00:09:59,730
you of course get the alert from the first try,

219
00:09:59,730 --> 00:10:01,960
but then here you see the

220
00:10:01,960 --> 00:10:03,670
second comment which I added,

221
00:10:03,670 --> 00:10:06,630
which was sanitized before it was stored.

222
00:10:06,630 --> 00:10:09,760
And here the strange lt and gt parts

223
00:10:09,760 --> 00:10:11,770
are translated back into lower than

224
00:10:11,770 --> 00:10:13,180
and greater than signs,

225
00:10:13,180 --> 00:10:16,800
but this overall is not executed as JavaScript

226
00:10:16,800 --> 00:10:19,900
but instead just output as plain text,

227
00:10:19,900 --> 00:10:21,860
because it was already transformed

228
00:10:21,860 --> 00:10:24,074
from HTML queue text

229
00:10:24,074 --> 00:10:26,640
in the sanitization step

230
00:10:26,640 --> 00:10:28,630
and now the browser just translates it

231
00:10:28,630 --> 00:10:30,943
into a more user-friendly version.

232
00:10:31,870 --> 00:10:34,291
But of course here I do escape

233
00:10:34,291 --> 00:10:36,430
in addition to sanitizing it

234
00:10:36,430 --> 00:10:39,340
and therefore we output the sanitized version

235
00:10:39,340 --> 00:10:40,450
as plain text

236
00:10:40,450 --> 00:10:42,393
and hence we have these ugly

237
00:10:42,393 --> 00:10:46,330
lt and gt placeholders here.

238
00:10:46,330 --> 00:10:48,580
That's why you typically shouldn't do both,

239
00:10:48,580 --> 00:10:51,720
you should not sanitize and escape

240
00:10:51,720 --> 00:10:53,570
instead you should do either of the two.

241
00:10:53,570 --> 00:10:56,470
And typically just escaping is preferred,

242
00:10:56,470 --> 00:10:58,460
but you should sanitize if you plan

243
00:10:58,460 --> 00:11:00,780
on outputting unescaped content

244
00:11:00,780 --> 00:11:02,440
so that you cleaned it first,

245
00:11:02,440 --> 00:11:03,660
that's the idea.

246
00:11:03,660 --> 00:11:04,820
Here we're doing both

247
00:11:04,820 --> 00:11:07,350
because I wanted to show both.

248
00:11:07,350 --> 00:11:10,730
Now you can also change the configuration

249
00:11:10,730 --> 00:11:12,890
for this XSS package

250
00:11:12,890 --> 00:11:15,710
to change how things are getting sanitized,

251
00:11:15,710 --> 00:11:19,730
or as mentioned before you check out express-validator,

252
00:11:19,730 --> 00:11:21,410
which is a great alternative,

253
00:11:21,410 --> 00:11:23,920
which also offers sanitization,

254
00:11:23,920 --> 00:11:27,120
but in addition also offers validation

255
00:11:27,120 --> 00:11:30,120
where you can use a middleware-based approach

256
00:11:30,120 --> 00:11:32,630
to validate incoming user data

257
00:11:32,630 --> 00:11:36,960
and for example check if something is an email and so on.

258
00:11:36,960 --> 00:11:39,440
So what we did manually before in the course,

259
00:11:39,440 --> 00:11:42,040
this package could also do it for you

260
00:11:42,040 --> 00:11:45,733
and it could also add sanitization in addition.

261
00:11:46,860 --> 00:11:48,860
Of course you could also add logic

262
00:11:48,860 --> 00:11:51,840
that for example checks if some user input

263
00:11:51,840 --> 00:11:53,660
that's about to be saved

264
00:11:53,660 --> 00:11:56,210
contains things like script text

265
00:11:56,210 --> 00:11:59,140
and then block that input all together.

266
00:11:59,140 --> 00:12:02,600
It will always depend on how your website should behave

267
00:12:02,600 --> 00:12:05,130
and what your overall goal is.

268
00:12:05,130 --> 00:12:07,830
But this is sanitization in action

269
00:12:07,830 --> 00:12:10,830
and together with escaping user output,

270
00:12:10,830 --> 00:12:12,660
you can now really protect

271
00:12:12,660 --> 00:12:14,750
against cross-site scripting attacks

272
00:12:14,750 --> 00:12:16,920
and you can make sure that this code,

273
00:12:16,920 --> 00:12:19,030
this malicious JavaScript code

274
00:12:19,030 --> 00:12:21,183
can't be injected into your site.

