WEBVTT

00:00.400 --> 00:08.650
Okay, one, we got a little loop built out so that we go through each of the job URLs inside of this

00:08.650 --> 00:10.480
list of jobs.

00:10.480 --> 00:17.410
But we also need to have a way to limit the request rate we make on the Craigslist site so that they

00:17.410 --> 00:23.830
don't block us from accessing the site because they think we might be making too many requests, which

00:23.830 --> 00:26.410
is not like a normal user behavior.

00:26.410 --> 00:31.480
So we need to limit how many requests we make per second and how do we do this?

00:31.480 --> 00:34.720
Well, it is quite simple.

00:34.720 --> 00:42.220
We just need to write out a generic sleep function in the JavaScript so we can write out something like

00:42.220 --> 00:44.320
async function sleep.

00:44.320 --> 00:50.140
And this sleep function then takes in the number of milliseconds we want it to wait.

00:50.140 --> 00:50.950
So.

00:51.830 --> 01:00.140
Milliseconds and then we can say something like, well, we can say return new promise.

01:00.180 --> 01:02.510
Then we have the result callback.

01:02.860 --> 01:05.300
Then we say set, timeout.

01:06.020 --> 01:07.970
And we pass in the result.

01:08.000 --> 01:10.640
Callback here and the milliseconds.

01:11.600 --> 01:18.830
Now what this is going to do is it's going to return a new promise, which is only going to be resolved

01:18.830 --> 01:23.750
once set timeout has run through the milliseconds we pass to it.

01:24.380 --> 01:28.370
So set timeout is just a native JavaScript function.

01:28.610 --> 01:34.850
It takes in the number of milliseconds and it has a callback function that's going to call once the

01:34.850 --> 01:36.410
milliseconds has passed.

01:37.880 --> 01:45.980
And once that callback has been called, then we say the promise has been resolved and we can move on

01:45.980 --> 01:47.120
to other things.

01:47.600 --> 01:55.490
So the way to use this inside of our function or inside of our loop would be something like await sleep

01:55.490 --> 01:58.460
and then we just pass in the number of milliseconds.

01:58.460 --> 02:02.150
So 1000 milliseconds means one second wait.

02:05.100 --> 02:06.060
That is it.

02:06.060 --> 02:09.090
That's how we can make a generic sleep function.

02:09.120 --> 02:13.050
You can also use this inside of NodeJS requests.

02:13.050 --> 02:19.500
So if you're using NodeJS request or Nightmare JS, whatever you might be using, you could use this

02:19.500 --> 02:22.170
function to make the threat sleep.

02:22.750 --> 02:25.570
Now keep in mind that it only works this way.

02:26.200 --> 02:32.770
If you're using async await, if you're using something like dense classes, then you would have to

02:32.770 --> 02:33.810
do something like this.

02:33.820 --> 02:38.200
You would write out sleep 1000 and then have a then.

02:39.560 --> 02:41.960
And then you would have some code here.

02:42.870 --> 02:47.070
But I would highly recommend that you use the async await.

02:47.070 --> 02:51.450
Instead, it makes the code so much easier to read and maintain, I think.

02:53.050 --> 02:53.590
Okay.

02:53.590 --> 02:54.310
So.

02:56.300 --> 02:59.540
That's how we make the scraper sleep.

02:59.540 --> 03:01.850
And now on to the next section.

03:01.850 --> 03:08.240
We are going to be scraping the content text we have for each of these job descriptions.

03:09.650 --> 03:16.400
Now, I recommend that as an exercise that you try to scrape this content of the job listing yourself.

03:16.400 --> 03:17.690
Try and do this.

03:17.690 --> 03:23.600
It's a similar process to what we've been doing before when we get the text content of an element.

03:23.720 --> 03:30.590
So go ahead and try to write out something where you can scrape out the text content of a job description.

03:30.590 --> 03:34.490
And in the next section I'm going to show you how I will do it.
