WEBVTT

00:05.050 --> 00:05.740
I read on.

00:05.740 --> 00:06.850
Welcome back.

00:07.120 --> 00:11.770
So and this video we're going to start talking about Mungo day being more details on how we can create

00:11.770 --> 00:16.470
a database in the Mombo the that we install in the previous year.

00:16.810 --> 00:18.310
OK so let's get started.

00:18.520 --> 00:21.520
So here we're going to first talk about modeling in Monga.

00:21.520 --> 00:22.200
DP Right.

00:22.240 --> 00:27.760
Which is an important idea or concept especially in databases in general.

00:27.760 --> 00:30.860
So data Mongo D-B has a flexible schema as it is.

00:30.940 --> 00:38.770
So we don't really have a schema as per se like the normal schema we saw in irregular tables with many

00:38.770 --> 00:39.410
fields.

00:39.430 --> 00:39.810
Right.

00:39.880 --> 00:44.920
We don't really have these the stringent scheme so you can say we have a flexible schema.

00:45.000 --> 00:47.380
Now we're really a scheme about a flexible schema.

00:47.530 --> 00:53.320
And then so you don't have to have all of these fields in the documents in the same collection for the

00:53.320 --> 00:54.900
documents in the same collection.

00:55.060 --> 01:00.340
So they don't need to have the same set of fields or structure and common fields in a collection of

01:00.340 --> 01:02.950
documents may hold different types of data.

01:02.980 --> 01:08.860
So we said that a document that a collection might have a bunch of documents right and each of these

01:08.860 --> 01:14.680
documents might have some of these fields or letters over us one might have for example maybe all of

01:14.680 --> 01:15.730
them have ID.

01:15.760 --> 01:16.110
Right.

01:16.120 --> 01:19.030
Because that's the common thing between all documents.

01:19.390 --> 01:21.160
Maybe this one has name.

01:21.160 --> 01:24.250
This one has last name right.

01:24.250 --> 01:26.970
This one has a date of birth name right.

01:27.070 --> 01:34.420
So they don't necessarily have to have all all of the same field what they necessarily are they're in

01:34.420 --> 01:41.300
the same collection they must represent something or or have similar purpose for example.

01:41.320 --> 01:47.590
In this case you're if they all have names but some of them have date of birth some of them have gender.

01:47.590 --> 01:49.030
Some of them don't have gender.

01:49.030 --> 01:53.170
Some of them are so incomplete information.

01:53.170 --> 01:57.880
Regardless of this all of these documents refer to human beings right or people.

01:57.970 --> 01:59.590
So we can generalize that.

01:59.650 --> 02:02.840
We say that this collection over here represents people.

02:02.840 --> 02:03.270
Right.

02:03.460 --> 02:08.430
But some of the information of these people are known and some of these information are not known.

02:08.790 --> 02:10.730
OK so hopefully this made sense.

02:12.190 --> 02:14.970
Now some consideration while designing scheme.

02:14.980 --> 02:15.310
Right.

02:15.310 --> 02:22.510
And this is really we as we said Mungo Nimi doesn't really have the schema but so whenever I say schema

02:22.510 --> 02:28.630
here we don't think about the same thing as in a table but rather than think as if they're very very

02:28.630 --> 02:30.740
flexible schema that you can change.

02:30.760 --> 02:31.390
OK.

02:31.690 --> 02:38.110
So the first rule is to design your schema according to user requirements and this is very important.

02:38.380 --> 02:41.750
Combined we're going to take an example on this in a few slides.

02:41.970 --> 02:45.000
And so you combine objects into one document.

02:45.040 --> 02:46.790
If you use them together.

02:46.900 --> 02:51.360
Otherwise separate them but make sure there shouldn't be a need of Joines OK.

02:51.370 --> 02:56.170
So what this means is if you have a document should be self-contained.

02:56.170 --> 02:56.520
Right.

02:56.590 --> 03:00.240
You shouldn't have a document that needs to rely on another document.

03:00.370 --> 03:05.490
So if you need another document to rely on this document then put it as an embedded some document in

03:05.520 --> 03:08.600
your so doc too.

03:08.910 --> 03:13.530
And then add the second document over here right in this area over here.

03:13.780 --> 03:17.450
But don't separate documents if you need them to.

03:17.500 --> 03:18.820
If you'll use them together.

03:18.820 --> 03:19.140
Right.

03:19.180 --> 03:26.190
So that's the rule of thumb also duplicate that data because this space is cheap.

03:26.680 --> 03:28.720
But computer time is not cheap.

03:28.720 --> 03:29.890
So what does this mean.

03:30.040 --> 03:36.070
Well nowadays if you look at disk space disk space is very cheap you can buy two terabytes or three

03:36.070 --> 03:42.510
terabytes disks very very cheaply as opposed to compute time.

03:42.580 --> 03:50.220
If you want to you know run run for example a query on your database that might run for five hours.

03:50.230 --> 03:54.270
And that will cost you a lot more than if your just runs for two minutes.

03:54.400 --> 04:00.970
So what this is saying is when you duplicate the data it's easier in the queries so it makes the compute

04:00.970 --> 04:01.890
time low.

04:01.960 --> 04:10.110
So duplicating the data duplicating makes the compute time go down.

04:10.300 --> 04:13.930
But it makes the disk space go up.

04:13.930 --> 04:16.180
Right because you're duplicating some of the data.

04:16.180 --> 04:19.660
So you're adding extra stuff to your disk space.

04:19.660 --> 04:20.780
So it's increasing.

04:20.950 --> 04:22.430
But this is cheap.

04:22.450 --> 04:24.070
This isn't too bad.

04:24.160 --> 04:26.690
This is expensive so we like this degrees.

04:26.800 --> 04:29.110
So always favor duplicating data.

04:29.170 --> 04:33.270
But of course limited right don't just keep duplicating your data everywhere.

04:33.400 --> 04:37.500
But we've always preferred to duplicate the data.

04:37.510 --> 04:43.330
In doing so for example you might put you might have a document here that this document needs but you

04:43.330 --> 04:45.700
also need those documents on its own.

04:45.730 --> 04:51.740
So you would think this document and duplicate it inside if you're inside this document and that's fine.

04:51.910 --> 04:54.670
Right because you're duplicating that data because you need it.

04:54.850 --> 04:57.270
You need to use it with this document also.

04:57.310 --> 04:57.870
OK.

04:58.000 --> 04:59.430
So that's fine.

04:59.860 --> 05:06.490
Also do Joines while right and not on rates that don't do the Joines why you're writing but while reading

05:07.000 --> 05:10.430
optimize your schema for most frequent use cases.

05:10.480 --> 05:15.810
So make sure you cover the most frequent use cases and don't just keep thinking about the corner cases.

05:15.910 --> 05:17.760
Well they are important to consider.

05:17.800 --> 05:22.600
But even if they take a slightly longer time just favor frequent use cases.

05:22.690 --> 05:28.890
So if you have three types of queries in the first career you want to create two or are always done.

05:28.930 --> 05:34.000
Then you should favor your schema towards these queries instead of your theory which maybe is called

05:34.000 --> 05:35.240
one or two times.

05:35.350 --> 05:39.560
And even if it is this query takes for example you know 10 seconds.

05:39.640 --> 05:45.100
If this is micro-seconds this is taking micro-seconds then you're off your overall doing pretty well

05:45.140 --> 05:46.030
right.

05:46.150 --> 05:52.810
And so another thing is do complex aggregation and the schema so don't just for example let's say you

05:52.810 --> 05:59.860
want a SASO aggregation Heeres like some average all these types of different operations.

05:59.980 --> 06:06.360
So don't don't just read write all the records and then loop over every one of them.

06:06.400 --> 06:08.770
For example take the sum or the average.

06:08.770 --> 06:17.350
But instead there are actual queery language that we can use inside the schema that would make us able

06:17.350 --> 06:21.370
to use use aggregation for example some variables.

06:21.370 --> 06:23.410
And it's very they're very fast.

06:23.410 --> 06:29.530
So whenever possible check first if you're trying to do an aggregation like for example count how many

06:29.530 --> 06:35.560
customers bought this product then don't just loop over all the customers and see which one has bought

06:35.560 --> 06:36.260
this customer.

06:36.260 --> 06:37.770
This will be super low.

06:37.900 --> 06:43.350
But instead you can do an aggregation in the scheme and we're going to talk about this like neon in

06:43.440 --> 06:46.610
future videos and this will be a ton easier.

06:46.680 --> 06:53.710
OK so before you do any complex aggregation make sure you can if you can do it in the aggregation language

06:53.710 --> 06:55.000
in the query language.

06:55.030 --> 06:55.560
OK.

06:56.680 --> 06:58.500
OK let's take an example.

06:58.510 --> 07:04.570
So suppose a client needs a database design for his blog or web site so this is a common use case when

07:04.600 --> 07:12.350
explaining databases and see the differences between RDBMS and Mongo db schema design.

07:12.880 --> 07:15.300
Now the Web site has the following requirements.

07:15.340 --> 07:20.120
So if you remember one of the rules we discussed is design your schema according to the user requirements.

07:20.230 --> 07:25.060
So we should always design according to these write rules over here.

07:25.120 --> 07:32.850
So if every post has a unique title description and you are also all of these are unique right so unique.

07:32.850 --> 07:39.130
Let's let's make sure we circle that every post going to have one or more tags.

07:39.160 --> 07:39.800
OK.

07:39.910 --> 07:43.850
So we said in your title description and then that's a post.

07:43.930 --> 07:51.990
So each post can have one or more tags and every post has the same number as the same name of its publisher

07:52.060 --> 07:54.350
and the total number of flights.

07:54.580 --> 08:04.870
Every post has comments given by users along with their name message date time and likes each post their

08:04.870 --> 08:08.490
candidate for each poster can be either 0 or more commas.

08:08.500 --> 08:11.950
OK so these are the user requirements that we collect.

08:11.950 --> 08:18.730
So if you are developing a database or a normal RDBMS then this would look something along the schema

08:18.730 --> 08:24.280
would look something like this so you'd have a table for comment a table for posts and table for the

08:24.280 --> 08:28.850
tags and then each of these tags would have a different ID.

08:28.860 --> 08:31.180
A post ID which is related to it.

08:31.180 --> 08:35.550
Right because each tag is related to some post ID.

08:35.950 --> 08:38.650
And what the name of the tag is right.

08:38.900 --> 08:43.250
And so you here you would notice here there's a one to infinity right so.

08:43.380 --> 08:49.390
And if that you can have an infinite number of tags corresponding to one post right because if you think

08:49.390 --> 08:53.870
about it a post might be tagged with a database.

08:53.890 --> 08:57.810
And Mongo D.B and no sequel right.

08:57.860 --> 08:59.310
So all of these are attacks right.

08:59.320 --> 09:07.260
So you can have multiple tags many tags to one post only here to post this many to one.

09:07.660 --> 09:15.610
And also we can how many comments to one so we have many to 1 and so each comment has a common ID a

09:15.610 --> 09:24.280
post ID by user message date that day the time and light as from the user requirements over here and

09:24.280 --> 09:32.650
the Post has an ID which links to the comments and Tag List has a title and description of your likes

09:32.740 --> 09:35.290
and post by the postman.

09:35.800 --> 09:37.400
OK so hopefully this made sense.

09:37.400 --> 09:44.310
This is usually the normal way we would go along with designing the RDBMS schema right.

09:44.400 --> 09:48.750
But in the case of Mungo D-B it's actually quite different.

09:48.750 --> 09:55.580
So while they Mungo D-B schema design will have one collection post and the following structure.

09:55.590 --> 10:04.230
So we have this outer Jason File here or a document here would be called the post document.

10:04.360 --> 10:07.940
And inside this document there is embed area.

10:07.960 --> 10:15.100
So we have a comments here and then inside these comments we have an array of documents where each of

10:15.100 --> 10:17.250
these documents represents a comment.

10:17.350 --> 10:18.970
So a user message.

10:19.120 --> 10:20.700
They've created unlikes.

10:20.800 --> 10:27.120
So each of these write different comments and we have an array and use a different document.

10:27.250 --> 10:31.690
And so we have an array of documents for a comment.

10:31.810 --> 10:35.410
Also we have an array of documents for text.

10:35.560 --> 10:44.060
So we have an array here of tag document tag document and tag documents you also have the like stored

10:44.180 --> 10:47.450
the you are l by description and title.

10:47.550 --> 10:48.140
OK.

10:48.240 --> 10:49.770
So this made sense.

10:50.130 --> 10:54.000
And this is of course the idea that each post gets and it's unique.

10:54.180 --> 11:00.150
So this is how it would look like and you might be wondering well if I have another post which has for

11:00.150 --> 11:06.960
example this is the same text that I would be duplicating data and that's perfectly fine right because

11:06.960 --> 11:10.960
for example let's say one of the tags here is sequel.

11:10.980 --> 11:11.530
Right.

11:11.760 --> 11:16.970
Then another post I might write might also have sequel and then another post might have sequel.

11:17.040 --> 11:18.990
So I'm actually duplicating the data.

11:18.990 --> 11:23.190
Can't I just create one object to reference inside here.

11:23.280 --> 11:25.380
And the answer is actually it doesn't matter.

11:25.390 --> 11:34.220
In Mongolia we favor duplicating Dega that compu to decrease in favor for decreasing computation time

11:34.380 --> 11:41.280
and make sure that all our documents are self-contained so they don't need to reference outside of documents.

11:41.550 --> 11:48.510
So while showing the data in RDBMS you need to join three tables in non-good D and in non-good the data

11:48.510 --> 11:51.480
will be shown from only one collection.

11:51.480 --> 11:53.760
Right so you don't have to show more than one collidge

11:56.440 --> 11:59.390
OK so how do we create a database and Mongo DBI.

11:59.590 --> 12:05.650
Well to create a database and logwood Eby we use the keyword use and then database name.

12:05.770 --> 12:11.220
So this is used to create a database and the command will create a new database if it doesn't exist

12:11.740 --> 12:14.460
or it will return the existing database.

12:14.470 --> 12:18.700
So if you use for example use ABC.

12:18.730 --> 12:27.790
So no cochère ABC than if you use ABC if we see that database then it's going to switch to ABC database.

12:27.790 --> 12:34.930
If it's not database then it will create a new database called ABC and that is going to switch to that

12:34.930 --> 12:35.940
database.

12:35.980 --> 12:41.140
So the basic syntax of use database is just use use database name.

12:41.410 --> 12:42.970
So let's take an example.

12:42.980 --> 12:50.880
So if you want to create a database with name Majdi be right without this without this brackets here.

12:50.890 --> 12:57.410
So my DBI then we can write then we can use the used data base statement as use my DBI.

12:57.520 --> 12:59.490
So let's go and try that in.

12:59.610 --> 13:08.980
In Mogadishu the server remounting first log in and then we're gonna say use my DVH right so we want

13:08.980 --> 13:16.980
to call it use my day me write my database and then click enter and then it will say here switch to

13:16.990 --> 13:18.870
database my DBI.

13:18.880 --> 13:26.220
So I have done that successful now to check your currently selected database use the command DBI.

13:26.230 --> 13:31.680
So if I go back here I could write DBI and they will tell me that are on the database.

13:31.780 --> 13:34.080
My DBI.

13:34.270 --> 13:37.480
Now I can also if you want check the database list.

13:37.480 --> 13:38.230
Use the.

13:38.260 --> 13:40.430
Use the command show the B's right.

13:40.510 --> 13:46.450
So if I want to see a list of all the date databases then I can do is show Digby's And you'll notice

13:46.450 --> 13:49.150
here that my database is not showing.

13:49.150 --> 13:50.510
So why is that.

13:50.830 --> 13:58.110
Well actually your created database my name is not present in the list because it doesn't have any documents

13:58.190 --> 14:02.990
and so to do that you need to insert at least one document in it.

14:03.070 --> 14:04.480
So I'll just do it.

14:04.520 --> 14:06.850
Database MoVida insert.

14:06.880 --> 14:09.910
I'm just going to insert one document into it.

14:09.910 --> 14:17.730
So let's go back and then I'm going to go D-B dot movie dot insert and then I'm going to make a document

14:17.730 --> 14:24.940
or say name and then the name word correspond to for example x y is that.

14:25.090 --> 14:31.290
So if I click enter then it will tell me the right result is an answer to this.

14:31.300 --> 14:40.540
I've answered it one record and so now if I do show the BS then it should show my database year and

14:40.540 --> 14:46.680
it's 0.00 0 gigabit right because it's only a small size.

14:46.690 --> 14:53.010
Now if we check the database which we saw that when we clicked show it databases than our database were

14:53.010 --> 14:53.790
showing.

14:53.820 --> 15:01.660
And but until you inserted a document then the database is not created.

15:01.660 --> 15:05.850
OK so how do we delete a database so that's how we create a database.

15:05.850 --> 15:08.620
But what about how about we delete a database.

15:08.660 --> 15:15.320
So much would he be D-B that job database command is used to drop an existing database.

15:15.460 --> 15:24.030
So the basic syntax of job database is just to do D-B job database and it deletes the current database

15:24.040 --> 15:25.050
that you're on.

15:25.060 --> 15:30.700
So this will delete the selected data is if you haven't slept in any database then it will delete the

15:30.700 --> 15:34.130
default or all the default test database.

15:34.240 --> 15:36.040
So let's take an example.

15:36.040 --> 15:41.380
So first check the list of variable of available data base by clicking show debuts.

15:41.380 --> 15:43.610
Let's go back and rewrite show on TBS.

15:43.810 --> 15:47.990
And we have an admin config local and my D-B right.

15:48.160 --> 15:48.680
OK.

15:48.730 --> 15:54.780
So now if you want to delete in the new database my baby then we first switch into it.

15:54.780 --> 15:56.910
So we use my DBI.

15:57.430 --> 16:04.060
So it will tell you switched to database my D.B and then now we're going to do our job database.

16:04.120 --> 16:11.800
So we do DVH drop database and then it will tell you dropped my D-B.

16:11.830 --> 16:16.710
OK once so we have successfully deleted this database.

16:16.930 --> 16:23.550
If I write Chaudry means now if I want to see my database showed the BS then you would see that my the

16:23.590 --> 16:26.050
image no longer in the list it's deleted.

16:26.050 --> 16:30.710
So I deleted my database so I will stop here.

16:30.740 --> 16:33.770
And in the next video we'll continue from here on.

16:33.950 --> 16:36.580
So until the next video APIC on.