﻿1
00:00:02,560 --> 00:00:03,660
‫Hello everyone.

2
00:00:03,820 --> 00:00:10,530
‫In this video we will learn how to split our available data and do best and green site

3
00:00:13,650 --> 00:00:23,500
‫then we will train our more than on training set and we'll find out script on our test site to create

4
00:00:23,580 --> 00:00:24,960
‫that screen is split.

5
00:00:24,970 --> 00:00:29,700
‫First we need to import the function as string split from a scalar.

6
00:00:30,390 --> 00:00:34,060
‫We will write from a scalar dot model selection

7
00:00:43,640 --> 00:00:44,430
‫import

8
00:00:48,050 --> 00:00:49,610
‫Green s split

9
00:00:58,470 --> 00:01:00,750
‫known to use this function.

10
00:01:00,750 --> 00:01:04,470
‫We first need to define our four variables.

11
00:01:04,470 --> 00:01:10,900
‫That is our independent Green variable a word independent best variable.

12
00:01:10,920 --> 00:01:16,640
‫Then number dependent green variable and then dependent test variable.

13
00:01:17,040 --> 00:01:24,210
‫So we'll cleared this for variable we'll write X underscored Green

14
00:01:28,110 --> 00:01:29,600
‫we'll write X and the score.

15
00:01:29,600 --> 00:01:30,030
‫Best

16
00:01:34,420 --> 00:01:35,700
‫wire and best score green

17
00:01:38,880 --> 00:01:41,520
‫and Y underscore test.

18
00:01:41,990 --> 00:01:51,930
‫These are our four data frames and this will get the value of the output of our screen test split functions.

19
00:01:51,970 --> 00:01:55,420
‫We'll write Green underscore best

20
00:02:03,080 --> 00:02:09,200
‫and in record will first mention the word independent variable which is x Maruti

21
00:02:13,460 --> 00:02:22,190
‫then our dependent variable which is why Mighty Few remember we created this very well known last lecture.

22
00:02:22,190 --> 00:02:29,760
‫Then the next parameter is a size as mentioned in order to re lecture.

23
00:02:29,970 --> 00:02:34,790
‫We are splitting our data and it between data issue.

24
00:02:34,980 --> 00:02:42,740
‫So what 80 percent of data will go into training set and no 20 percent of data will go in to test site.

25
00:02:42,900 --> 00:02:51,910
‫That's right here we will mention zero point two this zero point two means 20 percent of data will go

26
00:02:51,910 --> 00:02:58,570
‫into slice and the last parameter is that domestic.

27
00:02:58,660 --> 00:02:59,780
‫Does that I know them.

28
00:02:59,790 --> 00:03:08,100
‫But you can give any integer value we are providing this number just to get the same sample every time.

29
00:03:08,610 --> 00:03:16,700
‫So if I mentioned random I say equal to one every time while running this test string slight every time

30
00:03:16,930 --> 00:03:20,830
‫my training and test site will remain same.

31
00:03:20,830 --> 00:03:28,180
‫So even if you are using the same random estate as we are using you will also get the same training

32
00:03:28,180 --> 00:03:29,160
‫site.

33
00:03:29,560 --> 00:03:37,740
‫So we will just mention random status CEDAW if you want to get the same training set.

34
00:03:38,490 --> 00:03:41,070
‫You should also mention random search as 0

35
00:03:44,050 --> 00:03:53,870
‫not just to check the number of rows and columns in our training and test set will right bring X underscore

36
00:03:53,870 --> 00:04:00,480
‫Crane dot shape dot shape but you give me the number of rows and columns

37
00:04:20,520 --> 00:04:26,250
‫now you can see 80 percent of our data is in training say so.

38
00:04:26,470 --> 00:04:39,760
‫Four hundred and four observations are in our training set and rest of 1 0 2 observations are in asset.

39
00:04:39,900 --> 00:04:48,130
‫Now we will follow the standard process of creating a linear regression model will first create an object.

40
00:04:48,210 --> 00:04:49,650
‫This time we will name it.

41
00:04:49,680 --> 00:04:51,150
‫ELAM underscored a

42
00:04:56,640 --> 00:05:10,070
‫and will equate it to linear regression.

43
00:05:10,100 --> 00:05:16,040
‫Now we will train our model from our training set that is extreme.

44
00:05:16,060 --> 00:05:17,960
‫And by train right

45
00:05:21,030 --> 00:05:25,260
‫Lemond scored a dog fit.

46
00:05:25,500 --> 00:05:39,980
‫Then we will mention our training set which is x and the school train and Y underscore underscoring.

47
00:05:40,070 --> 00:05:46,390
‫This statement will fit our model on our training set.

48
00:05:46,610 --> 00:05:52,360
‫Now let's create the predicted value of y using this model.

49
00:05:53,240 --> 00:05:55,670
‫So we will write y underscored test

50
00:06:00,000 --> 00:06:07,770
‫underscored a equate to a limb a predict

51
00:06:12,730 --> 00:06:13,130
‫here.

52
00:06:13,130 --> 00:06:20,830
‫I'm predicting my test dependent variable so I will give my test independent variable so I will write

53
00:06:23,840 --> 00:06:25,340
‫X and the score test.

54
00:06:28,970 --> 00:06:29,780
‫If I run this

55
00:06:33,460 --> 00:06:40,270
‫I have my predicted value of test set and by underscore best underscore a similarly we will create by

56
00:06:40,300 --> 00:06:46,450
‫underscore Crane underscore to a will get the predicted values of our training set

57
00:06:55,830 --> 00:06:56,330
‫this time.

58
00:06:56,340 --> 00:06:58,830
‫We will use X and the school crane

59
00:07:06,080 --> 00:07:11,770
‫now to check the artist squared value for our training and test data.

60
00:07:12,080 --> 00:07:13,820
‫We will import another function

61
00:07:16,650 --> 00:07:20,410
‫we will write from a skill and not my tricks.

62
00:07:20,440 --> 00:07:21,010
‫Import

63
00:07:31,030 --> 00:07:33,670
‫import are to that score score scored

64
00:07:37,800 --> 00:07:40,910
‫you don't have to learn on this in Texas.

65
00:07:40,980 --> 00:07:46,040
‫You can just save a copy of this notebook or you can search online.

66
00:07:46,080 --> 00:07:53,190
‫The index are readily available online now to get the artist good value.

67
00:07:53,270 --> 00:07:58,100
‫We just need to mention ah to underscore discord.

68
00:07:58,130 --> 00:07:59,910
‫This is a function we imported

69
00:08:02,640 --> 00:08:06,230
‫if we want to get more detail about this function.

70
00:08:06,270 --> 00:08:13,130
‫We can just certitude and help by using question mark operator can just write.

71
00:08:13,130 --> 00:08:19,550
‫Question mark and if we executed we will get all the details.

72
00:08:19,700 --> 00:08:20,800
‫So here.

73
00:08:20,870 --> 00:08:30,970
‫If you see in the same index we need to mention our y underscore crew which is the order is not Y values

74
00:08:31,450 --> 00:08:36,820
‫then we need to mention the Y predicted values really do the same.

75
00:08:37,070 --> 00:08:42,730
‫And just blows this right out to that school to.

76
00:08:46,090 --> 00:08:50,090
‫And then record will first rate y underscore

77
00:08:54,370 --> 00:08:54,670
‫this

78
00:09:00,120 --> 00:09:06,450
‫so y underscore test is our or is another value and Y underscore best underscore a is the predicted

79
00:09:06,450 --> 00:09:07,810
‫value of y variable.

80
00:09:07,900 --> 00:09:09,270
‫So we'll just run this

81
00:09:12,720 --> 00:09:22,070
‫so the is good value is zero point five for now let's call the artist good value for our training set

82
00:09:35,100 --> 00:09:40,220
‫fit on this the artist good value for our training set is zero points and five.

83
00:09:41,710 --> 00:09:49,180
‫You can also see that the artist good value for our test data is less than the artist good value for

84
00:09:49,180 --> 00:09:59,030
‫our training site as we discuss in the world to be lectures that test artist good value is of more importance

85
00:09:59,120 --> 00:10:06,350
‫as compared to training site and we should always look at our test score instead of creating artist

86
00:10:06,390 --> 00:10:10,300
‫good values to evaluate the performance of our model

87
00:10:13,080 --> 00:10:17,850
‫that so you split your data into distant dream and by 10.

