1
00:00:00,210 --> 00:00:03,700
The very first thing we'll start with will be to impart our C curve.

2
00:00:03,720 --> 00:00:08,880
So we have our C curve and this is part of can learn metrics.

3
00:00:08,880 --> 00:00:10,740
So we'll run this again.

4
00:00:11,100 --> 00:00:14,010
And then just right here, we'll make use of our method.

5
00:00:14,490 --> 00:00:20,130
We're going to output the number of it's going to output a number of false positives, true positives,

6
00:00:20,130 --> 00:00:23,580
and then thresholds which we'll use in coming up with our C plot.

7
00:00:23,580 --> 00:00:31,350
So we have that, we have our rosy curve, rosy curve, and then this takes in the labels and then the

8
00:00:31,650 --> 00:00:33,130
and then the predicted.

9
00:00:33,180 --> 00:00:35,320
So we have this predicted value.

10
00:00:35,340 --> 00:00:45,330
Now if you print this length out, the length of FP, length of TPI and len of thresholds, So we have

11
00:00:45,330 --> 00:00:49,740
exactly the same number thresholds here.

12
00:00:50,040 --> 00:00:51,900
We should have exactly the same number.

13
00:00:51,900 --> 00:00:54,870
Here we have three 3330.

14
00:00:55,050 --> 00:00:57,500
Okay, So now we have this.

15
00:00:57,510 --> 00:01:04,260
Now note that the reason why we need this is because when coming up with our rosy plot, like, say

16
00:01:04,260 --> 00:01:13,560
we have this rosy plot we want to have for each and every point, the corresponding TPI FP and then

17
00:01:13,560 --> 00:01:20,550
the threshold which will lead to that girl tpi FP pair.

18
00:01:20,640 --> 00:01:27,270
So that said, we are going to make use of this data now and then plot out the rosy curve.

19
00:01:27,570 --> 00:01:29,010
So let's get straight into that.

20
00:01:29,010 --> 00:01:32,790
We have our plot and then we do plotting.

21
00:01:32,790 --> 00:01:37,170
We pass in the false positives through positives just like x Y.

22
00:01:37,170 --> 00:01:41,310
So here we have X and then here is why, let's get back.

23
00:01:41,310 --> 00:01:42,030
That's fine.

24
00:01:42,030 --> 00:01:44,250
And then from here we have the levels.

25
00:01:44,250 --> 00:01:55,890
So we have X level, which is our false positive and then the Y level are true positive rate.

26
00:01:56,130 --> 00:01:57,680
So we've seen this already.

27
00:01:57,690 --> 00:01:59,400
Now we have this.

28
00:01:59,400 --> 00:02:06,180
We could include the green, so we have this greed and then we show.

29
00:02:06,180 --> 00:02:11,460
So that's it, we run that and here is what we get.

30
00:02:11,460 --> 00:02:18,530
So this is our C plot right here based on this FP and CP we got from here.

31
00:02:18,540 --> 00:02:20,970
Now, how do we include this thresholds?

32
00:02:20,970 --> 00:02:26,280
In order to include these thresholds, we are going to make use of map plot leaps, test method.

33
00:02:26,280 --> 00:02:38,340
So we have plot that text and then here we're going to have the TPI or the FP, TPI and then what the

34
00:02:38,340 --> 00:02:41,820
actual text will be put it in here will be the thresholds.

35
00:02:41,820 --> 00:02:43,770
So we'll be passing the thresholds.

36
00:02:43,770 --> 00:02:51,030
But now not that we actually have to do this for each and every point, which is impossible since there

37
00:02:51,030 --> 00:02:55,080
will be too many texts put out here and it's going to be choked up.

38
00:02:55,080 --> 00:02:58,170
So what we could do is we could keep some values.

39
00:02:58,170 --> 00:03:05,250
So we say for I in range, we're going to start from zero right up to the length.

40
00:03:05,250 --> 00:03:08,400
That's actually going to be 330 length of thresholds.

41
00:03:08,400 --> 00:03:10,470
And then we are going to skip skipping some values.

42
00:03:10,470 --> 00:03:13,230
So skip and then we'll define the skip.

43
00:03:13,230 --> 00:03:15,180
So let's start with this keep of 20.

44
00:03:15,180 --> 00:03:22,380
So initially we're going to skip 20 values and then once we skip this value, let's pass this in here.

45
00:03:22,380 --> 00:03:30,060
Once keep this value, we pick in a given I picking that given I same year and then we do the same with

46
00:03:30,060 --> 00:03:31,140
the thresholds.

47
00:03:31,500 --> 00:03:36,660
So we get the corresponding false positive rate corresponding to positive rate and then the corresponding

48
00:03:36,690 --> 00:03:37,470
threshold.

49
00:03:37,650 --> 00:03:40,230
Now that's done, we could run this.

50
00:03:40,230 --> 00:03:43,860
So here is what we get.

51
00:03:43,860 --> 00:03:51,750
We see this plot now, we could see the Rosie plot with the different thresholds.

52
00:03:52,530 --> 00:04:02,160
Let's I think we could let it like, this is fine, let's increase the size and then we try to focus

53
00:04:02,160 --> 00:04:09,270
just on this portion, which actually matters the most, because we wouldn't want to get into this regions,

54
00:04:09,270 --> 00:04:15,780
because in these regions are false, positive rate is going to be too high and this regions below this

55
00:04:15,780 --> 00:04:18,450
would have a very small, true positive rate.

56
00:04:18,450 --> 00:04:22,580
So generally we try to focus on this zone right here.

57
00:04:22,590 --> 00:04:29,550
Now, depending on the problem you are trying to solve, if your false positive rate is what matters

58
00:04:29,550 --> 00:04:35,970
the most, that is, if you cannot afford to have a high false positive rate, then you tend to pick

59
00:04:35,970 --> 00:04:36,480
values.

60
00:04:36,480 --> 00:04:38,370
Let's say let's let's break it out like this.

61
00:04:38,370 --> 00:04:40,770
So here is like the meet points.

62
00:04:40,770 --> 00:04:42,690
We have some sort of meet here.

63
00:04:42,870 --> 00:04:45,840
Let's let's draw this line.

64
00:04:46,230 --> 00:04:46,800
Okay.

65
00:04:46,800 --> 00:04:48,780
So we have some sort of midpoint here.

66
00:04:48,780 --> 00:04:53,610
So this is like 0.50.46, 0.62 and all of that.

67
00:04:53,610 --> 00:04:55,650
So we're breaking it up like this.

68
00:04:55,650 --> 00:04:59,880
And then if you want to ensure that you try to minimize as much as possible.

69
00:04:59,990 --> 00:05:05,390
Or a false positive rate without reducing your true positive rates too much, then you tend to take

70
00:05:05,390 --> 00:05:06,660
values around this.

71
00:05:06,680 --> 00:05:15,080
But if you want to make sure a true positive rate remains quite high, even at the detriment of the

72
00:05:15,080 --> 00:05:19,160
false positive rate, then you tend to pick out values in the zone.

73
00:05:19,160 --> 00:05:23,720
So you have these two zones to pick your threshold from.

74
00:05:24,080 --> 00:05:25,880
This is the main zone.

75
00:05:25,880 --> 00:05:28,850
And then you have this zone right here.

76
00:05:28,880 --> 00:05:32,240
This other zone and this other zone.

77
00:05:32,240 --> 00:05:37,370
So we have zone one and zone two from which you have to pick from.

78
00:05:37,760 --> 00:05:45,320
One other quick note is that if you have a problem like the one we're trying to solve, where parasite

79
00:05:46,070 --> 00:05:50,600
is zero and then uninfected is one.

80
00:05:50,600 --> 00:05:52,640
So that is how the data set was created.

81
00:05:52,640 --> 00:05:55,130
And this based on this, we build our model.

82
00:05:55,130 --> 00:06:00,020
Then this means that this will be considered as negative samples.

83
00:06:00,140 --> 00:06:06,500
While this will be considered as positive samples, whereas in the real world we will tend to look at

84
00:06:06,500 --> 00:06:10,820
uninfected as negative and parasitic as positive.

85
00:06:10,820 --> 00:06:16,910
So you have to be very careful with these terms and know exactly how your dieter and models have been

86
00:06:16,910 --> 00:06:17,630
built.

87
00:06:18,020 --> 00:06:23,840
And that's why in our case, where we're trying to avoid situations where our model predicts a fake

88
00:06:23,840 --> 00:06:29,390
uninfected output that is a patient will actually has a parasite, but the model predicts that it's

89
00:06:29,570 --> 00:06:32,140
uninfected, there's actually a fake uninfected.

90
00:06:32,150 --> 00:06:37,580
This is a fake or false positive in our case.

91
00:06:37,580 --> 00:06:43,520
So we'll tend to minimize this number of false positives.

92
00:06:43,550 --> 00:06:53,300
Now, if you model was built such that parasitic is one and then uninfected is zeros, then it's clear

93
00:06:53,300 --> 00:07:01,310
that you will try to instead minimize the number of false negatives since uninfected is considered as

94
00:07:01,310 --> 00:07:02,090
negative.

95
00:07:03,740 --> 00:07:09,530
That said, coming back to our problem, since our dataset was constructed in this way, we're trying

96
00:07:09,530 --> 00:07:12,770
to minimize the number of false positives at all cost.

97
00:07:12,860 --> 00:07:19,340
But while doing this, we have to ensure that the true positive rate remains at the reasonable position.

98
00:07:20,750 --> 00:07:27,110
And so we could pick out a threshold of like 0.6265 given right here.

99
00:07:28,130 --> 00:07:32,210
Getting back to this, let's take 0.6265.

100
00:07:32,210 --> 00:07:40,070
We run that we have a number of false positives to be 87, which is going to be smaller than when we're

101
00:07:40,070 --> 00:07:42,860
having a threshold of, say, 0.5.

102
00:07:43,460 --> 00:07:44,570
Run that again.

103
00:07:44,840 --> 00:07:50,330
You see, eight is seven is going to be smaller than this 99, the value of 99 we get in now.

104
00:07:51,170 --> 00:07:53,570
And so that's it for this section on metrics.

105
00:07:53,570 --> 00:07:57,050
Thank you for following up to this point and see you next time.
