WEBVTT

00:00.870 --> 00:11.040
So remember how in the previous lectures we found out the CSS selector for item prop URL and we found

00:11.040 --> 00:16.740
out that the content attribute actually contained the URL for the home we wanted to scrape.

00:17.790 --> 00:27.690
Now let's try and get the HTML for each page, for each page, and then from the HTML we can use.

00:27.690 --> 00:28.650
Cheerio.

00:29.130 --> 00:38.250
We could also execute um, vanilla JavaScript inside of the page, but I like to use Cheerio instead

00:38.250 --> 00:40.140
to scrape my pages.

00:41.010 --> 00:46.740
Also, I'm going to add a try catch clause inside here.

00:52.340 --> 00:54.470
Just to catch any errors.

00:58.900 --> 00:59.800
Like so.

01:01.390 --> 01:05.170
Now let's get the HTML from the page.

01:05.560 --> 01:11.950
So const HTML await page dot evaluate.

01:13.150 --> 01:20.380
Now if you're familiar with something like Nightmare JS, for example, this is really similar to Nightmare

01:20.380 --> 01:25.930
JS where we basically execute JavaScript inside of the browser.

01:26.020 --> 01:34.250
So this is just similar to going inside of your console and writing code inside here.

01:34.270 --> 01:39.490
So it's going to execute inside of this chromium browser.

01:40.570 --> 01:50.890
And what we want to do is we want to return the HTML of the side so we can use the inner HTML.

01:52.720 --> 02:00.770
Now this is HTML we want to pass into Chario.

02:04.870 --> 02:07.390
And then we can use the selector on it.

02:08.140 --> 02:11.530
So let's try and get some homes.

02:11.740 --> 02:16.510
We say homes and we select our

02:18.580 --> 02:19.810
item.

02:19.840 --> 02:20.770
Excuse me.

02:20.950 --> 02:27.040
Item prop all the elements with the item prop attribute.

02:29.240 --> 02:31.070
That says you're also.

02:33.050 --> 02:33.860
Excuse me.

02:35.150 --> 02:40.400
I need to make this a double quote so we can have the single quotes in here.

02:40.520 --> 02:47.240
So all the elements that has the attribute origin prop with the URL inside.

02:48.260 --> 02:52.610
And let's just run a map over this one.

02:52.610 --> 03:04.430
So I element and for each of these we just return the elements attribute content.

03:09.370 --> 03:20.830
So in case you forgot from the previous section, each of these homes in this page, they have a.

03:24.700 --> 03:25.720
Let me show you.

03:28.150 --> 03:37.420
They have a item prop or a meta element containing the item prop URL.

03:37.450 --> 03:44.410
This is what we're selecting on and it has the content with the URL for the home or the room we're looking

03:44.410 --> 03:46.150
for in this page.

03:48.110 --> 03:50.540
Let's go into the code again.

03:51.380 --> 03:54.410
So this is the property we're looking for.

03:54.440 --> 03:57.350
I'm missing a square bracket here.

03:57.770 --> 04:01.730
Remember to put that in if you didn't get it also.

04:03.350 --> 04:10.280
So let's try and run this and see if we get a array of homes.

04:10.280 --> 04:13.670
So I will run console log homes.

04:14.900 --> 04:20.990
And this should contain an array of the different URLs from the different homes.

04:21.620 --> 04:23.120
Let's try and run it.

04:25.920 --> 04:27.930
So now the browser is running.

04:30.840 --> 04:31.710
Let's see.

04:32.040 --> 04:32.820
Okay.

04:33.640 --> 04:36.810
Uh, I think I did the old one again.

04:36.880 --> 04:42.480
Um, if you've seen the nightmares section, I did this also.

04:43.440 --> 04:46.560
We need to use dot get.

04:48.300 --> 04:49.140
Excuse me.

04:49.180 --> 04:52.860
Dot, get in order to actually get the values.

04:52.860 --> 04:56.010
If not, we're getting the chario objects instead.

04:56.460 --> 04:59.190
So we're dot get it should be fine.

04:59.280 --> 05:01.530
We should get the URLs like we want.

05:02.490 --> 05:04.710
Now do mind if you want to.

05:04.740 --> 05:12.570
You can also just execute some JavaScript in here which is returning a array and just get the array

05:12.570 --> 05:14.010
directly like that.

05:15.180 --> 05:23.580
However, I prefer to just take the whole instead of the page and pass it into Cheerio, and then I

05:23.580 --> 05:26.730
can select as much as I want after that.

05:28.700 --> 05:33.740
But that's just something to keep in mind, that that's also an opportunity if you don't like to use

05:33.740 --> 05:35.000
Cheerio so much.

05:35.920 --> 05:40.270
Now let's try and run this again and see if we get the array that we want.

05:44.340 --> 05:48.390
Okay, so it looks like we are getting the array.

05:49.980 --> 06:00.360
Now all that's left now is to go through each of these and basically get the descriptions and the rooms,

06:00.360 --> 06:04.890
prices and so on for each of these rooms and homes.
