WEBVTT

00:00.330 --> 00:04.440
Hi, everyone, and welcome back to the Knowledge Portal Video series.

00:04.440 --> 00:05.220
This is Zeal.

00:05.460 --> 00:09.240
And in today's topic, we'll be talking about hashing.

00:09.930 --> 00:13.920
Hashing is one of the very, very important functions that is available.

00:13.920 --> 00:19.560
And as an information security person, hashing is one of the things that will be always useful throughout

00:19.560 --> 00:20.160
your life.

00:20.160 --> 00:22.350
So let's see what hashing does.

00:24.400 --> 00:28.840
So here we have a sample document.

00:28.840 --> 00:31.800
So let's consider this as a notepad file.

00:31.810 --> 00:36.160
And inside a notepad file there is a text called as Secret Text.

00:36.940 --> 00:42.440
Now, once we pass this notepad file to a hashing algorithm.

00:42.460 --> 00:46.030
Hashing algorithm will generate a unique output.

00:47.260 --> 00:52.660
Now, this output depends upon the text which is written inside the file.

00:53.080 --> 00:55.840
So here, if we see it is secret.

00:56.170 --> 00:57.970
Secret text.

00:58.510 --> 01:01.060
Now let's look into another example.

01:02.680 --> 01:05.350
Where we have changed the contents.

01:05.350 --> 01:09.820
So instead of secret text, we have written secret text one.

01:09.820 --> 01:12.610
So there is one number that is added to the file.

01:13.360 --> 01:18.730
We pass that file to the hashing algorithm and we get a output.

01:18.760 --> 01:27.220
Now compare this output with starts from a 71 ends with D six to the original output, which starts

01:27.220 --> 01:29.650
from 64 and ends with F.

01:29.890 --> 01:37.270
So here you see even changing one character or one number of a file.

01:37.300 --> 01:42.730
The whole output of the file changes once you pass it over to the hashing algorithm.

01:43.180 --> 01:45.640
So what does hashing algorithm does?

01:46.460 --> 01:51.320
Well, hashing algorithm will tell us if the file is modified or not.

01:52.430 --> 01:55.100
So if we talk about viruses.

01:55.130 --> 01:59.960
Viruses basically attach to the files and they change the structure of the file.

02:00.260 --> 02:03.890
So it is through the help of hashing algorithm.

02:03.890 --> 02:08.120
It is possible for us to know whether a file is modified or not.

02:10.390 --> 02:16.240
So before we go into this slide, I'll give you a small demonstration on how hashing works.

02:18.350 --> 02:19.070
I'll minimize the.

02:21.780 --> 02:22.920
And.

02:22.920 --> 02:23.520
Yeah.

02:28.580 --> 02:29.810
And maximize the skill.

02:32.010 --> 02:37.620
So here, if we see this is my Kali Linux machine, which I'm connected to putty.

02:37.830 --> 02:41.610
So it is running on the background in my virtual machine.

02:42.240 --> 02:48.070
So here there is one document, there is a text file called as Document.

02:48.120 --> 02:54.010
And if I do a cat, I'll see the contents of the file.

02:54.030 --> 02:58.470
So the content of the file is secret space text.

02:58.920 --> 03:03.630
Now let me find the MD5 sum of this.

03:04.350 --> 03:05.120
Documents.

03:05.160 --> 03:07.170
I'll run MD5 sum.

03:07.200 --> 03:13.470
This will give me the hash value and followed by the name of the file which is document.

03:14.750 --> 03:16.640
So here it gave me some.

03:17.870 --> 03:19.610
Hashed value of the document.

03:20.400 --> 03:20.970
Now.

03:22.110 --> 03:29.610
Let me open the document and let me add a full stop over here.

03:30.050 --> 03:30.450
Right.

03:30.660 --> 03:32.010
And I'll save it.

03:32.490 --> 03:34.470
So now we have modified the file.

03:36.010 --> 03:39.040
Let me again take the hash value.

03:41.450 --> 03:43.990
And if you compare this hash.

03:45.680 --> 03:50.860
With the below hash, you can easily identify that the file is changed.

03:50.900 --> 03:51.430
Right.

03:51.440 --> 03:56.930
So hashing helps us to identify whether a file is modified or not.

03:56.960 --> 04:02.130
So if you look into a production environment, generally files would not be modified.

04:02.150 --> 04:08.450
So in production environment for all the important files, there is a program which monitors whether

04:08.450 --> 04:15.710
the file is changed or not, and if the file is changed, the program will immediately email to the

04:15.710 --> 04:19.040
system administrator saying this file is modified.

04:21.700 --> 04:22.570
So.

04:24.290 --> 04:26.210
Coming back to the presentation.

04:33.830 --> 04:41.590
Now, if we compare if we ask what is the difference between an encryption and a hashing?

04:41.600 --> 04:45.490
So basically encryption is a two way function.

04:45.500 --> 04:47.270
So what does two way function mean?

04:47.510 --> 04:53.170
So here you see first, this is a plaintext data knowledge portal.

04:53.180 --> 04:57.410
We encrypt it and we got some encrypted value.

04:57.440 --> 05:04.690
Now, by applying the key on a decryption algorithm, we are getting the plaintext back.

05:04.700 --> 05:12.980
So here we are applying key ones on the encryption side and applying key second time on the decryption

05:12.980 --> 05:13.310
side.

05:13.760 --> 05:14.750
So.

05:16.120 --> 05:23.770
If we have this encrypted value, we can always recover the plaintext from this encrypted value by applying

05:23.770 --> 05:24.160
the key.

05:24.760 --> 05:26.830
But hashing does not work this way.

05:28.150 --> 05:30.310
Hashing is a one way function.

05:30.310 --> 05:33.370
So that means if this is a document.

05:34.440 --> 05:36.900
And you pass it to the hashing algorithm.

05:37.050 --> 05:40.200
Then you get this output value.

05:40.290 --> 05:47.220
Now, from this output value, you will not be able to recover the document back.

05:48.190 --> 05:51.160
Now you will ask then what is the use of hashing function?

05:51.520 --> 05:56.850
So let's explore on what could be the benefits of this kind of a.

05:57.740 --> 05:58.070
The.

05:59.590 --> 06:04.510
First benefit is your system files do not change.

06:04.870 --> 06:09.690
If it has changed, then there is a high probability that a virus has done it right.

06:09.700 --> 06:17.590
Most of the system files will not change any time, only and only if some person is manually changing

06:17.590 --> 06:22.240
the file, which is highly unlikely, or if a virus is doing it.

06:22.420 --> 06:29.200
So in a production environment all the time, there is a file integrity monitoring software that runs

06:29.200 --> 06:32.980
which will email us as soon as some file is modified.

06:34.380 --> 06:36.660
And how it knows if the file is modified.

06:36.670 --> 06:38.040
It compares the hash.

06:39.390 --> 06:47.370
Now, the second use of hashing is you can store your password in a hash value instead of an encrypted

06:47.370 --> 06:47.940
value.

06:48.270 --> 06:52.170
So let's go to one note and understand this.

06:54.600 --> 06:58.500
I lose my favorite pen and I go back to the full screen.

06:59.530 --> 07:00.160
So.

07:01.700 --> 07:06.290
So whenever you log in to, let's say, Windows Machine, it will ask you for a.

07:07.600 --> 07:11.290
User and it will ask you for a password.

07:13.210 --> 07:15.160
This is a Windows box.

07:17.190 --> 07:21.510
So in order to login, it will ask you for your username and it will ask you for a password.

07:21.750 --> 07:27.630
Now how will Windows verify that the credentials that you have supplied are correct or not?

07:27.660 --> 07:29.670
Well, certainly it would have some file.

07:29.670 --> 07:29.940
Right.

07:29.940 --> 07:30.630
Where?

07:32.670 --> 07:35.850
It would check that if this username and password is right.

07:35.850 --> 07:39.990
So there has to be a file where the username and password must be stored.

07:40.990 --> 07:47.080
So whenever you type username password, your Windows operating system will verify it from the file.

07:47.260 --> 07:48.730
If it is correct.

07:49.340 --> 07:52.340
Then you will be granted access.

07:52.640 --> 07:55.640
Now let's explore about this file a bit.

07:56.610 --> 08:00.600
Let's say your username and password is stored in plain text data.

08:00.630 --> 08:01.060
Right?

08:01.080 --> 08:04.390
So if my username is Z.

08:06.010 --> 08:07.870
And password is Motorola.

08:10.880 --> 08:18.380
So if the plaintext username and password is stored over here, that means any person can open this

08:18.380 --> 08:21.560
file in notepad and can see the contents of it.

08:22.320 --> 08:22.770
Right.

08:22.770 --> 08:29.310
So we were discussing this is why encryption is used so that even if the person gets hold of the data,

08:29.310 --> 08:31.410
he will not be able to read the data.

08:31.470 --> 08:32.220
Right.

08:32.220 --> 08:36.000
So there can be two scenarios which can be used over here.

08:36.180 --> 08:38.790
The first scenario is encryption.

08:41.230 --> 08:43.900
And the second scenario is hashing.

08:48.040 --> 08:49.720
Let's look at the first scenario first.

08:51.250 --> 09:01.000
So if we are using encryption, that means that your windows has to store some secret key somewhere

09:01.000 --> 09:01.810
in the file.

09:02.590 --> 09:03.040
Right?

09:03.640 --> 09:09.520
So secret key has to be stored somewhere so that windows can actually decrypt the file and can see the

09:09.520 --> 09:10.600
contents of it.

09:10.630 --> 09:13.030
Now, what happens if the secret key gets leaked?

09:13.930 --> 09:17.410
All the username and passwords are leaked, right?

09:17.530 --> 09:21.550
So this is not a very, I would say, optimal way.

09:23.940 --> 09:25.710
Second way is hashing.

09:26.100 --> 09:29.610
So what happens in hashing is.

09:31.620 --> 09:32.700
Your windows.

09:33.920 --> 09:36.890
Will take a hash value of this Motorola.

09:37.340 --> 09:37.820
Right.

09:37.850 --> 09:39.730
And it will have some hash value.

09:39.770 --> 09:41.660
C Phi 2N1.

09:41.810 --> 09:43.360
This is some random value, right?

09:43.370 --> 09:48.260
And it will store username as well as hash value in the.

09:49.540 --> 09:50.470
System five.

09:50.650 --> 09:54.670
So if I open the system file, it would be something, say, Z.

09:58.120 --> 10:03.130
Zeal followed by the hash value C, five, two and one.

10:04.240 --> 10:12.910
So let's say hackers gets access to the system file and they are able to read zeal followed by this.

10:12.940 --> 10:16.150
Now, as we have seen, hashing is a one way function.

10:16.150 --> 10:18.250
So from this value.

10:18.280 --> 10:23.440
So from this hash value, we cannot recover the password back.

10:24.010 --> 10:27.330
It is not a two way function, it is only a one way function.

10:27.340 --> 10:33.130
So even if the hacker sees this file, he will not be able to find the password.

10:34.410 --> 10:41.040
So this is the reason why hashing is used in almost all the operating system to store the password.

10:42.220 --> 10:44.440
So let me show you one of the example.

10:46.500 --> 10:51.510
So in Linux, the passwords are stored in shadow.

10:51.870 --> 10:53.970
So let's open e.t.c. shadow.

10:56.750 --> 10:59.540
So here you see first is the username.

10:59.690 --> 11:03.140
The username is root followed by.

11:05.490 --> 11:07.200
The hashed value of the password.

11:07.650 --> 11:11.100
So this is not an encrypted data.

11:11.130 --> 11:12.940
This is the hash value.

11:12.960 --> 11:15.930
So hash means one way function.

11:15.930 --> 11:21.360
So from this we cannot recover the original data back.

11:23.000 --> 11:32.960
So now the point is, how will the system know that this password, the password that you write, matches

11:32.960 --> 11:35.900
the password which is there in your system.

11:36.380 --> 11:43.730
So what system does is whenever you log in with username and password, say Motorola.

11:45.990 --> 11:49.890
Your system will first take the hashed value of this password.

11:50.220 --> 11:55.470
It will hash this password and it will get some hashed value.

11:57.330 --> 12:03.600
It will compare this hash value with a hash which is stored in your system file.

12:03.840 --> 12:08.710
And if the two hash matches and also if the username matches.

12:08.730 --> 12:10.530
It will grant you access.

12:11.220 --> 12:16.290
So here if I type moto rollout.

12:18.000 --> 12:19.140
Rule one.

12:19.710 --> 12:22.280
That means your system will first take a hash value.

12:22.290 --> 12:28.020
Now, as we have added one at the end, the hash will completely change, right?

12:29.730 --> 12:30.810
This is the hash.

12:30.810 --> 12:34.980
It will compare hash with the hash in the system file.

12:35.550 --> 12:38.490
And now, as the hash does not match, that means.

12:39.250 --> 12:43.160
The password is wrong and this is how your system verifies it.

12:43.970 --> 12:44.690
Now.

12:47.080 --> 12:50.610
There is one more use of hashing.

12:50.620 --> 12:52.150
Let's see how it is used.

12:53.500 --> 12:55.000
I erase everything.

12:56.700 --> 12:57.510
And.

12:58.430 --> 12:58.850
Okay.

13:00.340 --> 13:04.570
Let me create a new document and let's view it in the full screen.

13:04.870 --> 13:05.650
So.

13:08.350 --> 13:12.880
So let's say this is the second use of hashing.

13:14.730 --> 13:16.470
So let's say this is a server.

13:18.890 --> 13:20.120
And this is your.

13:21.290 --> 13:21.920
Laptop.

13:23.840 --> 13:26.940
This server contains a lot of softwares.

13:31.920 --> 13:33.900
And this is the Internet.

13:36.050 --> 13:36.730
So.

13:37.430 --> 13:37.890
Whoops.

13:40.070 --> 13:40.580
Uh.

13:42.510 --> 13:45.270
So as this is the Internet, that means.

13:46.590 --> 13:48.300
There'll always be a hacker somewhere.

13:50.120 --> 13:53.090
Okay, so let's say you are downloading.

13:57.750 --> 14:00.060
This is a Microsoft office.

14:01.630 --> 14:05.650
Okay, You're downloading Microsoft office from this server to your computer.

14:06.220 --> 14:09.580
Now, this file will come.

14:10.790 --> 14:11.810
Through the network.

14:21.370 --> 14:22.660
To your computer.

14:23.110 --> 14:25.750
Now there can be a possibility.

14:25.780 --> 14:26.410
Where?

14:26.440 --> 14:27.430
Hacker.

14:30.000 --> 14:37.050
Would maybe say block this file and insert his virus version of Microsoft Office.

14:37.080 --> 14:38.640
It is always possible, right?

14:38.640 --> 14:42.480
You never know because this server might be in us and your.

14:42.720 --> 14:43.950
Laptop might be in India.

14:43.950 --> 14:48.270
So there is a very big communication channel between these two countries.

14:48.300 --> 14:55.350
So how will you make sure that your software that you have downloaded is the exact version which is

14:55.350 --> 14:56.130
there from the server?

14:57.660 --> 15:02.640
The thing that is used is MD5, not MD5 hashing.

15:05.070 --> 15:11.310
So what happens is whenever you download a software from a server, they generally give a hash value

15:11.310 --> 15:12.330
along with the software.

15:12.450 --> 15:18.900
So this is our Microsoft office and along with the Microsoft office, they'll give you.

15:21.320 --> 15:22.370
A hash value.

15:22.940 --> 15:30.800
So once you download the Microsoft office from the server to your laptop, now your job is to take a

15:30.800 --> 15:32.090
hash value of this.

15:32.540 --> 15:35.540
Say this is a hash value that you get.

15:36.610 --> 15:41.440
And compare it compare it with the hash value which the Microsoft has provided.

15:41.470 --> 15:47.170
Now, if the hash value are same, that means you have downloaded the correct version.

15:47.260 --> 15:51.360
Now, if the hash value is different, there can be two possibilities.

15:51.370 --> 15:56.140
One is that your software is not downloaded correctly.

15:56.140 --> 15:59.260
Some portion of the software must have been corrupted.

15:59.290 --> 16:01.270
That is one possibility.

16:01.420 --> 16:03.970
Second possibility is.

16:05.300 --> 16:06.410
From the network.

16:06.440 --> 16:09.680
There has been some modification that has been done to the file.

16:09.800 --> 16:13.250
So there can be two possibilities for that.

16:14.000 --> 16:18.840
So always check for a hash value when you're downloading the software.

16:18.860 --> 16:26.660
Nowadays, most of the companies who are providing softwares online, they give the hash value on their

16:26.660 --> 16:28.820
website when you try to download the software.

16:30.720 --> 16:32.190
Coming back to the slides.

16:34.860 --> 16:40.890
So the third point that was we discussed was verify the integrity of the software that you download.

16:41.190 --> 16:44.130
So this is all about hashing.

16:44.220 --> 16:47.910
Try to make a file, try to hash it, modify the file.

16:47.940 --> 16:49.020
See the hash value.

16:49.050 --> 16:51.570
See if you are getting some different hash value.

16:51.900 --> 16:52.590
And.

16:53.620 --> 16:57.430
This is all about hashing, so I hope you have enjoyed this lecture.

16:57.430 --> 17:02.620
And in the next module we'll be talking about viruses and it is going to be very interesting.

17:03.070 --> 17:05.050
So thanks for watching.

17:05.350 --> 17:06.250
Have a nice day.
