WEBVTT

00:00.350 --> 00:05.870
So continuing with our journey on the Http compression and decompression.

00:05.960 --> 00:14.810
Today we are going to talk about the two important Http headers, which are except encoding and content

00:14.810 --> 00:15.680
encoding.

00:16.670 --> 00:23.750
So before we go about learning on how exactly it works, let me show you on how compression works.

00:25.280 --> 00:31.310
So here I have a text file which is sshd underscore config dot txt.

00:31.550 --> 00:38.240
So let's do a compression on this and see what level of compression can we achieve.

00:38.510 --> 00:44.720
So let's do a LS and here you see it consists of 3.8 k worth of data.

00:45.200 --> 00:48.770
So there are a lot of compression algorithms which are present.

00:48.770 --> 00:53.720
So we'll be using GZip, which is supported by the Http protocol also.

00:53.840 --> 01:01.980
So I'll say GZip nine followed by C sshd config and I'll say SSD dot js.

01:04.730 --> 01:06.050
Now if we do.

01:06.980 --> 01:14.600
And now you see the size difference between the uncompressed version and the compressed version.

01:14.600 --> 01:18.710
It is actually very, very different.

01:19.280 --> 01:26.450
So this is less than half the size, around 30% of the size of the uncompressed version.

01:26.660 --> 01:34.040
So this is the reason why in Http protocol, the compression and decompression has been introduced.

01:36.250 --> 01:44.410
So coming back to our main slide, let's understand on how exactly compression and decompression works

01:44.410 --> 01:46.600
as far as Http protocol is concerned.

01:52.820 --> 01:53.300
Okay.

01:54.180 --> 01:57.780
So this is a get request from the client.

01:57.780 --> 02:07.140
So it says get slash test.txt followed by the Http 1.1 host user agent.

02:07.140 --> 02:15.060
And there is one more field over here which says accept encoding, GZip and deflate.

02:15.690 --> 02:21.900
Now, there are a lot of encoding algorithms or as far as compression also is concerned, there are

02:21.900 --> 02:23.370
a lot of algorithms.

02:23.760 --> 02:27.180
There are three famous ones which are Http protocol supports.

02:27.210 --> 02:28.530
One is the GZip.

02:30.720 --> 02:35.280
Oh one is deflate and one is compress.

02:36.540 --> 02:42.720
So these three are the most famous compression encoding algorithms which have been supported by the

02:42.720 --> 02:43.950
Http protocol.

02:44.130 --> 02:49.160
So we'll be using GZip, which is mostly used on the Internet.

02:49.170 --> 02:51.210
Even deflate is used many times.

02:51.570 --> 03:00.810
So in this get request, the client is saying or the client browser is saying that it is supporting

03:00.810 --> 03:04.470
GZip and deflate with the zip encoding header.

03:04.950 --> 03:08.580
So and it sends a get request.

03:08.940 --> 03:16.200
Now the server will see the zip encoding header and now server knows that the client supports either

03:16.200 --> 03:18.210
GZip or deflate.

03:19.410 --> 03:26.070
Now on the server side, this is the test.txt which has following contents.

03:26.280 --> 03:33.700
Now, as the server knows that the client supports GZip based compression, what the server will do

03:33.700 --> 03:40.870
is the server will actually compress the test.txt and then send the compressed version of the file back

03:40.870 --> 03:41.770
to the client.

03:43.360 --> 03:52.630
Now the question is how will the client know whether the server has sent the compressed file based on

03:52.630 --> 03:54.790
GZip or deflate algorithm?

03:55.360 --> 04:05.770
So what server does is server sends a header called as content encoding, followed by the encoding algorithm,

04:05.770 --> 04:09.280
which the server has used to encode the data.

04:10.120 --> 04:16.420
So here we know that the server has used GZip to compress this particular data.

04:16.810 --> 04:24.070
So the server will include the GZip directive in the content hyphen encoding header so that the client

04:24.100 --> 04:31.930
knows that the data is being compressed with GZip based compression algorithm.

04:32.590 --> 04:41.320
So once the client receives the data, it actually decompresses it with the gzip decoder.

04:41.950 --> 04:43.630
I say gzip decoder.

04:45.780 --> 04:49.470
And then the uncompressed version of the data is received.

04:50.280 --> 04:57.390
So this is a basic overview on how exactly the compression and the decompression works with the excerpt

04:57.390 --> 04:59.260
and content encoding headers.

04:59.280 --> 05:08.460
So let's take a look into the low level data through Wireshark to see how exactly things work.

05:08.940 --> 05:10.140
So.

05:12.240 --> 05:13.260
I'll open Wireshark.

05:17.950 --> 05:21.040
And we'll listen on the net interface.

05:23.310 --> 05:24.240
Which is for VMware.

05:25.560 --> 05:26.580
So let's say start.

05:26.880 --> 05:30.060
So currently there are no data that is flowing in.

05:31.620 --> 05:33.900
So I'll go to my Mozilla.

05:43.040 --> 05:51.020
Okay, so let's open example.com slash sshd underscore config dot txt.

05:51.890 --> 05:55.460
So this is the file which was present on the server.

05:56.270 --> 05:59.600
And let's.

06:00.970 --> 06:04.960
Click and you see the file is opening on the browser.

06:05.350 --> 06:10.000
So going back to the Wireshark, let's see on what exactly has happened.

06:13.360 --> 06:18.040
Let me do a TCP flow stream so we'll get a proper output.

06:19.930 --> 06:26.350
So there are around 11 packets of data which got exchanged during this particular amount of time.

06:26.620 --> 06:34.030
So when it starts with since anak-anak followed by the get request we have already seen on how exactly

06:34.120 --> 06:35.200
this works.

06:36.670 --> 06:45.070
So going back to the TCP follow stream here, you see the client has sent the get request followed by

06:45.070 --> 06:54.790
the sshd underscore config and along with that the client has added an error which says accept encoding.

06:54.790 --> 07:01.750
So accept encoding is the header followed by what type of encoding the client browser supports.

07:01.750 --> 07:08.020
So Mozilla is saying that I am supporting GZip and deflate based encoding algorithms.

07:09.370 --> 07:16.250
Now the server now knows that the browser can support GZip or deflate.

07:16.250 --> 07:22.580
So what server does is it will compress the file with GZip.

07:22.820 --> 07:26.230
So here it is saying content encoding GZip.

07:26.240 --> 07:30.470
That means the file is being encoded or compressed with GZip.

07:30.860 --> 07:38.240
And now it is telling the Mozilla that if you want to open the file, just unzip or just compress it

07:38.240 --> 07:40.130
with the GZip utility.

07:40.820 --> 07:50.900
And this all is the encoded value which is encoded or which is compressed by GZip based compression

07:50.900 --> 07:51.680
algorithm.

07:52.640 --> 07:58.130
So these are the two headers which are normally sent.

07:59.000 --> 08:06.350
Now, one more important thing that you should remember is that if a client or if a modern client does

08:06.350 --> 08:15.100
not send accept encoding header, then the server will assume that the client can understand all the

08:15.100 --> 08:22.900
type of encoding algorithms which includes GZip, which includes deflate, which also includes compress.

08:23.800 --> 08:26.400
So this is important thing to remember.

08:26.410 --> 08:33.430
So this is one of the reasons why browser or the client always try to specify on what type of.

08:36.090 --> 08:37.680
Encoding they support.

08:39.270 --> 08:45.780
Now, this is the basics about the encoding and the content encoding header.

08:46.830 --> 08:55.590
Now let's say if the content is being encoded by, say, deflate and the content encoding is GZip,

08:55.890 --> 09:05.910
then the browser will actually try to compress it with deflate based and decoders, and then the decoding

09:05.910 --> 09:06.940
will not work.

09:06.960 --> 09:10.860
So always important to remember that.

09:12.040 --> 09:13.810
These fields are very important.

09:13.810 --> 09:22.030
And if there is any granularity as far as the content encoding is concerned, then the browser might

09:22.030 --> 09:25.480
not be able to interpret the data.

09:27.860 --> 09:33.380
One more important thing that I would like to share is that the content type.

09:33.680 --> 09:42.350
Now, even though the data is in the form of GZip based file, the content type will always remain the

09:42.380 --> 09:44.110
to the original file.

09:44.120 --> 09:48.350
So the original file was text file which was encoded with GZip.

09:48.890 --> 09:52.820
However, the content type will still not change.

09:52.820 --> 09:57.470
So this is the original content type of the data.

09:58.340 --> 10:02.180
So this is also something which is very important to remember.

10:02.570 --> 10:13.240
So this is it about the basics of except encoding and content encoding.

10:13.250 --> 10:17.150
So I hope this basics has been clear to you.

10:17.150 --> 10:19.720
And this is it about this video.

10:19.730 --> 10:23.570
I hope this has been informative for you and I'd like to thank you for viewing.