1
00:00:00,000 --> 00:00:03,600
Here's the comparison
of accuracies between

2
00:00:03,600 --> 00:00:07,485
the one layer LSTM and the two
layer one over 10 epochs.

3
00:00:07,485 --> 00:00:09,330
There's not much of
a difference except

4
00:00:09,330 --> 00:00:12,030
the nosedive and
the validation accuracy.

5
00:00:12,030 --> 00:00:14,760
But notice how the training
curve is smoother.

6
00:00:14,760 --> 00:00:17,190
I found from
training networks that

7
00:00:17,190 --> 00:00:18,929
jaggedness can be an indication

8
00:00:18,929 --> 00:00:20,700
that your model
needs improvement,

9
00:00:20,700 --> 00:00:22,530
and the single LSTM that you can

10
00:00:22,530 --> 00:00:24,330
see here is not the smoothest.

11
00:00:24,330 --> 00:00:26,700
If you look at loss,

12
00:00:26,700 --> 00:00:28,110
over the first 10 epochs,

13
00:00:28,110 --> 00:00:30,135
we can see similar results.

14
00:00:30,135 --> 00:00:32,280
But look what happens
when we increase

15
00:00:32,280 --> 00:00:34,290
to 50 epochs training.

16
00:00:34,290 --> 00:00:35,940
Our one layer LSTM,

17
00:00:35,940 --> 00:00:37,785
while climbing in accuracy,

18
00:00:37,785 --> 00:00:40,740
is also prone to
some pretty sharp dips.

19
00:00:40,740 --> 00:00:42,700
The final result might be good,

20
00:00:42,700 --> 00:00:43,880
but those dips makes me

21
00:00:43,880 --> 00:00:47,015
suspicious about the overall
accuracy of the model.

22
00:00:47,015 --> 00:00:49,925
Our two layer one
looks much smoother,

23
00:00:49,925 --> 00:00:53,405
and as such makes me much more
confident in its results.

24
00:00:53,405 --> 00:00:56,130
Note also the
validation accuracy.

25
00:00:56,130 --> 00:00:59,025
Considering it levels
out at about 80 percent,

26
00:00:59,025 --> 00:01:01,580
it's not bad given that
the training set and

27
00:01:01,580 --> 00:01:04,685
the test set were
both 25,000 reviews.

28
00:01:04,685 --> 00:01:06,379
But we're using 8,000

29
00:01:06,379 --> 00:01:09,275
sub-words taken only
from the training set.

30
00:01:09,275 --> 00:01:11,210
So there would be many tokens in

31
00:01:11,210 --> 00:01:13,865
the test sets that would
be out of vocabulary.

32
00:01:13,865 --> 00:01:15,570
Yet despite that, we are still

33
00:01:15,570 --> 00:01:17,805
at about 80 percent accuracy.

34
00:01:17,805 --> 00:01:20,360
Our loss results are similar with

35
00:01:20,360 --> 00:01:22,960
the two layer having
a much smoother curve.

36
00:01:22,960 --> 00:01:25,815
The loss is increasing
epoch by epoch.

37
00:01:25,815 --> 00:01:27,710
So that's worth
monitoring to see if it

38
00:01:27,710 --> 00:01:31,280
flattens out in later epochs
as would be desired.

39
00:01:31,280 --> 00:01:34,205
I hope this was
a good introduction into how

40
00:01:34,205 --> 00:01:37,745
RNNs and LSTMs can help you
with text classification.

41
00:01:37,745 --> 00:01:40,610
Their inherent
sequencing is great for

42
00:01:40,610 --> 00:01:43,520
predicting unseen text if
you want to generate some,

43
00:01:43,520 --> 00:01:44,870
and we'll see that next week.

44
00:01:44,870 --> 00:01:48,230
But first, I'd like to
explore some other RNN types,

45
00:01:48,230 --> 00:01:50,700
and you'll see those
in the next video.