WEBVTT

00:00.160 --> 00:01.560
Hey there Eden here.

00:01.560 --> 00:08.120
And in this video, we're going to be setting up our environment before we implement the documentation

00:08.120 --> 00:08.680
helper.

00:09.160 --> 00:14.040
So we're going to start by cloning the starting branch with all the boilerplate code that we need.

00:14.080 --> 00:20.840
And we're going to set up our env file which is going to holding the environment variables for our OpenAI

00:20.960 --> 00:24.360
API key and our pinecone API key.

00:25.600 --> 00:25.880
Okay.

00:25.920 --> 00:30.240
So in the repository we want to be cloning the branch one.

00:30.240 --> 00:31.000
Start here.

00:31.680 --> 00:36.000
So I'm going to go and copy this code this URL.

00:36.240 --> 00:43.560
And I'm going to go to the terminal I'm going to write git clone paste the URL dash b one.

00:43.560 --> 00:44.280
Start here.

00:44.520 --> 00:48.200
And this will clone the repository in the beginning branch.

00:48.440 --> 00:51.920
So we'll start by listing the current working directory of our project.

00:52.160 --> 01:00.480
Okay, so now it's time to go to pinecone to log in to our user and to create an index that is going

01:00.480 --> 01:04.920
to store the embeddings of the documentation of blockchain.

01:04.920 --> 01:13.160
So now let's give our index a meaningful name, so I'll call it long chain dash doc dash index.

01:13.360 --> 01:19.320
And we're going to tell pinecone which embeddings model we're going to be using to index our documents.

01:19.480 --> 01:22.920
So we're going to choose text embeddings really small of OpenAI.

01:23.280 --> 01:24.200
The dimension.

01:24.200 --> 01:29.000
So the size of the vector is going to be 1536.

01:29.280 --> 01:34.600
And here we're going to be using cosine similarity to determine the distances between the vectors.

01:35.160 --> 01:35.520
All right.

01:35.560 --> 01:37.880
By the way this is the new pinecone UI.

01:38.040 --> 01:39.360
So it's pretty similar.

01:39.560 --> 01:47.240
Um you here can write your index name and and yeah here in order to choose the embedding model, this

01:47.240 --> 01:48.520
is how the UI looks.

01:48.760 --> 01:55.360
And make sure you select the one 536 um in the embeddings dimension.

01:55.760 --> 01:56.640
Um, yeah.

01:56.640 --> 01:58.840
So this is the new UI in pinecone.

01:58.840 --> 02:00.440
Just important to show you this.

02:02.360 --> 02:02.880
All right.

02:02.880 --> 02:04.800
Let's go and set configuration.

02:05.040 --> 02:07.360
We're going to go with the serverless option here.

02:07.360 --> 02:09.760
And we're simply now creating the index.

02:09.760 --> 02:13.280
And in a couple of seconds our index will be initialized.

02:13.280 --> 02:13.440
Relaxed.

02:13.560 --> 02:19.760
By the way, notice that our index is deployed on AWS cloud and this is the region.

02:19.760 --> 02:21.880
Now we have flexibility on this.

02:21.880 --> 02:24.520
Let's say we want to deploy it on Google Cloud.

02:24.560 --> 02:29.360
Then we simply needed to mark the Google Cloud logo and it will be deployed there.

02:29.560 --> 02:34.960
And this is important because AWS customers would want their index to be hosted on AWS.

02:35.280 --> 02:38.840
And GCP customers would want it to be hosted on Google Cloud.

02:38.840 --> 02:45.120
So this helps with latency and with commercial agreements of companies working with those cloud providers.

02:45.520 --> 02:49.480
And the region is important, for example, for GDPR compliance.

02:49.480 --> 02:55.560
So for that, we'll need our vector store to be deployed only in a Europe data center.

02:57.280 --> 02:57.840
All right.

02:57.840 --> 03:03.840
So now we can ingest the documents of the documentation into our index.

03:04.120 --> 03:07.800
So link chain is going to be making requests to pinecone on behalf of us.

03:07.800 --> 03:11.520
So we need to copy your API key and give it to link chain.

03:11.640 --> 03:15.160
So let's copy it and let's put it in the dot env file.

03:15.320 --> 03:24.000
And let me open now the project in PyCharm for the first time in this video and I created a dot env

03:24.040 --> 03:29.000
file, which of course I didn't commit to the repository because I don't want to commit any secrets

03:29.400 --> 03:29.920
anyway.

03:29.920 --> 03:34.320
So I created a dot env file and in the dot env file I'll create.

03:34.360 --> 03:38.000
I'll simply put here the pinecone API key which I copied.

03:38.000 --> 03:44.600
And also we need to plug in of course the OpenAI API key or any other LLM vendor you're using.

03:45.640 --> 03:46.080
All right.

03:46.080 --> 03:50.120
So I want to deactivate my virtual environment my automatic one.

03:50.120 --> 03:52.880
And I want now to take a look at the Pip file.

03:52.880 --> 03:57.960
And here we have all the packages we need for our project which I already pre-prepared.

03:58.680 --> 04:02.400
And you can see for example we have Lang Lang pinecone.

04:02.840 --> 04:08.440
And if I'm going to look at the log file with the exact versions we can see, we have for example the

04:08.440 --> 04:12.040
latest version currently of Linkchain 0.26.

04:12.640 --> 04:17.920
And I remind you, you might have a different version depending when you're taking the course.

04:17.960 --> 04:24.000
Of course, I'll keep updating this course and the repository to be updated with the latest version.

04:24.000 --> 04:28.360
So if there are breaking changes I will update the code of course, and the videos.

04:28.400 --> 04:28.800
All right.

04:28.800 --> 04:30.920
So I just want to address a couple of things.

04:31.200 --> 04:36.880
I added a new logger.py file, which you should be seeing on the left side as well.

04:36.920 --> 04:40.120
This is something which I added now as the boilerplate.

04:40.160 --> 04:42.480
We're going to be using it in future videos.

04:42.840 --> 04:46.320
And this backend directory you can ignore.

04:46.360 --> 04:48.520
We'll be creating it in future videos.

04:48.680 --> 04:56.240
And also this docs directory you can ignore will be downloading the link chain docs in the next couple

04:56.280 --> 04:56.920
of videos.

04:57.280 --> 05:00.040
All right so now I want to run Pipenv install.

05:00.040 --> 05:03.640
And this will take all the information in the pip file dot log file.

05:03.840 --> 05:07.840
And it's going to install all the dependencies with the correct versions.

05:08.040 --> 05:14.320
So if you run that you're going to have the exact same version as I'm having right now of all the packages.

05:15.520 --> 05:19.800
So we're going to create a new file I'll call it ingestion.py.

05:19.840 --> 05:28.400
And this file is going to hold all of the implementation of ingestion the link chain documentation embedding

05:28.400 --> 05:33.160
it into vectors then storing those vectors into pinecone the vector store.
